36 0.40 0.26 0.32 0.28 0.34 0.28 1 the 16S rRNA gene and tDNA were identified by the WebMGA pipeline. The table shows general read-based information for the metagenomes. Rarefaction curves for the most detailed taxonomic level in MEGAN (including all taxa) were leveling off from a straight line at 10% of the metagenome size, indicating that the most abundant taxa were accounted for (Additional
file 3: Figure S2). From 1259 (Tpm2) to 1619 (Tpm1-2) taxa were detected in each metagenome at this level. At the genus level the rarefaction curves almost leveled out with 729 (Tpm1-1) to 808 (Tpm1-2) taxa detected, indicating good coverage of the taxonomic richness. Estimated genome sizes (EGS) for the seven samples were all in the same range and varied
between 4.6 (Tpm2) and 5.1 (Tplain) Mbp (Table PHA-848125 order 2). The fraction of reads assigned to specific genes or functions is therefore assumed to be comparable between the metagenomes. The estimated probability (per read) of sequencing a Bortezomib research buy random gene of 1000 bases was 0.0002 and between 181 and 199 hits could be expected in each metagenome, assuming the gene was present in one copy in all organisms [26]. The most abundant genes of the communities are therefore likely to be accounted for in our metagenomes. Specific genes of interest, present in only small fractions of the community, could however still be missed by chance. We also analyzed the taxonomy Dynein based on extracted reads assigned to the 16S rRNA gene to see if these results were consistent with the results obtained by the complete metagenomes. The number of reads assigned to the 16S rRNA gene ranged from 658 (Tpm2) to 1288 (Tpm1-2), accounting for approximately 0.1% of the reads (Table 2). As expected, rarefaction curves based on these reads were still increasing steeply at the genus level, where only 80 (Tpm2) to 130 (Tpm1-2) taxa were detected (results not shown). Unless otherwise
specified, the taxonomic results discussed in the following text are based on total reads. Geochemical, taxonomic and metabolic clustering Due to the complexity of the I-BET-762 manufacturer metagenomes and geochemical data, we performed an exploratory principal component analysis (PCA) to get an overview of the clustering of the samples and parameters tending to co-occur. The ordination analysis was based on the metagenomic data (taxonomic binning at the phylum level and metabolic annotation at level I SEED subsystems). The geochemical data was then fitted onto the ordination using the envfit function of the vegan library in R. The squared correlation coefficient (r2) showed that all geochemical parameters with p-values ≤ 0.1 had a high goodness of fit (Additional file 4: Table S2). The PCA plot shows that the two Oslofjord samples (OF1 and OF2) were highly similar and positioned in the top right quadrant (Figure 3A). All the Troll pockmark samples were positioned in the bottom half of the plot.