Topic outline

  • Rationale of the case study

    The case study was proposed by the LifeWatch CT Mediterranean in order to explore the susceptibility of EUNIS habitat to AS invasions. The case study was originally presented for freshwater and marine habitat (Boggero et al., 2014; Corriero et al, 2016). In the VRE alien species the case study, originally used to analyze freshwater and marine habitat, has been generalized to work with any biodiversity dataset available in the LW data portal. For a full comprehension of the rationale of the case study and the statistical approach used we strongly encourage the users to read carefully Boggero et al. (2014) and Corriero et al. (2016). 

    The workflow allows automating a set of operations that were originally written in an R script. This allows the user not skilled in R to replicate the same analyses in Taverna, and eventually modify the statistical workflow according to the aims of his research. However, for a full comprehension of what are the users doing, knowledge of the statistical tools used in this workflow (Generalized Linear Mixed Model) is a prerequisite. This is particularly relevant for two aspects: understanding if your data are suitable for this kind of analysis (i-e fit the assumption of GLMMs) and for the interpretation of the output.


    • Statistical workflow

      A statistical workflow was defined in order to use suitable statistical tools that can deal with the occurrence data provided by LifeWatch. The R statistical environment was used to analyse the data and produce the statistical workflow. The original R script (available for expert users that want use and/or modify it) was incorporated in a Taverna workflow to ease its usage also for R-unskilled users. The fact that the R script can be run within a Taverna workflow does not mean that this is a “black box” style analysis. Knowledge of statistics and GLMM assumptions and limitations is always necessary.

    • Taverna workflow

      In order to use the statistical workflow in Taverna, the R script was subdivided in its main components that were successively incorporated in Taverna as a set of 4 sub-workflows. These sub-workflows were then concatenated to replicate the analyses provided by Corriero et al. (2016). The authors could change the order of the sub-workflows, remove one (or some of them) to customize the workflow according to the aim of his/her research aim. In the next future, new modules will be available.

      Open Taverna and start the workflow.

      After starting the workflow, Taverna will open a web application in the browser, by means of which the user can interact with the LifeWatch data portal making a query, obtaining a list of matching datasets with related metadata and selecting the one he/her is interested in.

      After choosing the dataset the user can press on the button “Use in Taverna” and close the browser.

      From this moment no more interactions are required to the user and the workflow will automatically perform 3 main tasks:

      1) reshape the dataset in order to obtain alien species and native specie richness for each family at the habitat and site level. If more that 1 EUNIS habitat is present in a site the richness will be calculate for the two (or more that 2) habitat In the site,

      2) Best model fitting model selection. This subworkflow calls a set of R functions from the packages lme4 and MuMIn. Initially a full GLMM model is calculated including both richness and level-1 EUNIS habitat as fixed factor. Subsequently reduced models are  calculated and compared with the full model using the Akaike Information Criteria (AIC). The model showing the best AIC is used to create the output (tables and graph),

      3) Finally, a sub-workflow takes information from the reshaped dataset and plots rarefaction curves.