Outputs¶
Output for ‘sparse predict’¶
The taxonomic profiling results for ‘sparse query’ are saved in <workspace>/profile.txt
The first three rows in ‘profile.txt’ summarize the status of the reads from the metagenomic sample:
Total <No. reads> <No. matched reads>
Unmatched <% unmatched reads in total reads> 0.000
Uncertain_match <% unreliable matches in total reads> <% unreliable matches in total matches>
Example:
Total 26726530 23783388.0
Unmatched 11.012 0.000
Uncertain_match 36.102 40.570
This example suggests that 89% of reads are matched against at least one reference in the database. Subsequently, 60% of all matches found are used for taxonomic predictions.
The following lines describe the prediction at different taxonomic levels, in the following format:
<SPARSE group> <% in total reads> <% in matched reads> <taxonomic labels> (<reference IDs>)
Example:
~154 2.1706 2.4392 Bacteria|-|Actinobacteria|Actinobacteria|Micrococcales|Micrococcaceae (15969,66991,66935,66915,67189,110179,40981,154,67166,67220,114405,66878,66930,82153,40861,40710,67029)
u154 2.1701 2.4387 Bacteria|-|Actinobacteria|Actinobacteria|Micrococcales|Micrococcaceae|Rothia (15969,66991,66935,66915,67189,110179,40981,154,67166,67220,114405,66878,66930,82153,40861,40710,67029)
s154 2.1551 2.4217 Bacteria|-|Actinobacteria|Actinobacteria|Micrococcales|Micrococcaceae|Rothia|Rothia dentocariosa (*Rothia sp. HMSC067H10/*Rothia sp. HMSC064D08/*Rothia sp. HMSC071F11/*Rothia sp. HMSC069C01) (15969,66991,66935,66915,67189,110179,40981,154,67166,67220,114405,66878,66930,82153,40861,40710,67029)
~613 1.4988 1.6843 Bacteria|-|Firmicutes|Negativicutes|Veillonellales|Veillonellaceae (16778,16416,117596,16415,10931,17276,113949,60730,613)
u613 1.4934 1.6782 Bacteria|-|Firmicutes|Negativicutes|Veillonellales|Veillonellaceae|Veillonella (16778,16416,117596,16415,10931,17276,113949,60730,613)
s613 1.4507 1.6302 Bacteria|-|Firmicutes|Negativicutes|Veillonellales|Veillonellaceae|Veillonella|Veillonella parvula (*Veillonella sp. 6_1_27/*Veillonella sp. S13054-11/*Veillonella sp. 3_1_44) (16778,16416,117596,16415,10931,17276,113949,60730,613)
r15969 0.4677 0.5256 Bacteria|-|Actinobacteria|Actinobacteria|Micrococcales|Micrococcaceae|Rothia|Rothia dentocariosa|- (15969)
p15969 0.3907 0.4391 Bacteria|-|Actinobacteria|Actinobacteria|Micrococcales|Micrococcaceae|Rothia|Rothia dentocariosa|-|Rothia dentocariosa M567: GCF_000143585.1 (15969)
r16416 0.1838 0.2065 Bacteria|-|Firmicutes|Negativicutes|Veillonellales|Veillonellaceae|Veillonella|*Veillonella sp. 6_1_27|- (16416)
p16416 0.1631 0.1833 Bacteria|-|Firmicutes|Negativicutes|Veillonellales|Veillonellaceae|Veillonella|*Veillonella sp. 6_1_27|-|Veillonella sp. 6_1_27: GCF_000163735.1 (16416)
The SPARSE groups are internal hierarchical clustering results stored in the SPARSE database. The group label consists of two components. The prefix presents the ANI level of the cluster and the following number presents the designation of the cluster.
For example, ‘s613’ is a cluster ‘613’ in ‘s’ level (ANI 95%, “species level”) The correlation between prefix and ANI level is:
~ <90% ANI
u 90% ANI
s 95% ANI
r 98% ANI
p 99% ANI
n 99.5% ANI
m 99.8% ANI
e 99.9% ANI
c 99.95% ANI
a 100% ANI
‘s’ (ANI 95%) is normally treated as a ‘gold standard’ criterion for species definition.
For each SPARSE goup, the traditional taxonomic labels follow the format:
<superkingdom>|<kingdom>|<phylum>|<class>|<order>|<family>|<genus>|<species>|<subspecies>|<reference_genome>
These taxonomic labels are summarised from the input database. Sometimes multiple species will be associated with one SPARSE group:
s613 1.4507 1.6302 Bacteria|-|Firmicutes|Negativicutes|Veillonellales|Veillonellaceae|Veillonella|Veillonella parvula (*Veillonella sp. 6_1_27/*Veillonella sp. S13054-11/*Veillonella sp. 3_1_44) (16778,16416,117596,16415,10931,17276,113949,60730,613)
In this example, group s613 is associated with four different species:
Veillonella parvula
*Veillonella sp. 6_1_27
*Veillonella sp. S13054-11
*Veillonella sp. 3_1_44
Informal names are marked with prefix “*”. The most probable species is shown first, and followed by the other three names in a bracket. There is another bracket after the taxonomic labels:
(16778,16416,117596,16415,10931,17276,113949,60730,613)
These are the IDs of the actual reference genomes that were found in the database. They can be used to extract reference specific reads using the command ‘sparse extract’.
Output for ‘sparse report’¶
sparse can provide a report that combines multiple ‘sparse predict’ runs together into a tab-delimited text file. This command also identifies potential pathogens in the predictions.
#Group #Pathogenic ERR1659111 ERR1659110 #Species #Taxon
s3080 non 4.47309775569 4.84028327303 Actinomyces dentalis (*Actinomyces sp. oral taxon 414) Bacteria|-|Actinobacteria|Actinobacteria|Actinomycetales|Actinomycetaceae|Actinomyces|Actinomyces dentalis (*Actinomyces sp. oral taxon 414)
s1438 non 0.821962806352 3.57658189557 Desulfomicrobium orale Bacteria|-|Proteobacteria|Deltaproteobacteria|Desulfovibrionales|Desulfomicrobiaceae|Desulfomicrobium|Desulfomicrobium orale
s9975 non 2.04489272864 1.85184148971 *Anaerolineaceae bacterium oral taxon 439 Bacteria|-|Chloroflexi|Anaerolineae|Anaerolineales|Anaerolineaceae|-|*Anaerolineaceae bacterium oral taxon 439
s939 non 1.81538010098 0.712860400235 Pseudopropionibacterium propionicum Bacteria|-|Actinobacteria|Actinobacteria|Propionibacteriales|Propionibacteriaceae|Pseudopropionibacterium|Pseudopropionibacterium propionicum
s8820 non 1.67063037869 0.491279312566 *Ottowia sp. Marseille-P4747 (*Ottowia sp. oral taxon 894) Bacteria|-|Proteobacteria|Betaproteobacteria|Burkholderiales|Comamonadaceae|Ottowia|*Ottowia sp. Marseille-P4747 (*Ottowia sp. oral taxon 894)
s2215 non 1.31802856115 0.34575838713 Lautropia mirabilis Bacteria|-|Proteobacteria|Betaproteobacteria|Burkholderiales|Burkholderiaceae|Lautropia|Lautropia mirabilis
s2590 non 0.665641018802 0.612783437737 Actinomyces cardiffensis Bacteria|-|Actinobacteria|Actinobacteria|Actinomycetales|Actinomycetaceae|Actinomyces|Actinomyces cardiffensis
s2189 non 0.87220732902 0.296597041195 Corynebacterium matruchotii Bacteria|-|Actinobacteria|Actinobacteria|Corynebacteriales|Corynebacteriaceae|Corynebacterium|Corynebacterium matruchotii
s108979 non 0.295928369726 0.857545958706 *Actinomyces sp. oral taxon 897 Bacteria|-|Actinobacteria|Actinobacteria|Actinomycetales|Actinomycetaceae|Actinomyces|*Actinomyces sp. oral taxon 897
The first line shows the samples in the report, as well as additional annotations (starts with ‘#’). #Group and #Taxon are identical to the ‘sparse predict’ output. #Species is a simple extraction of the most probably species in the #Taxon column and #Pathogenic contains potential pathogen predictions encoded as:
non - not a pathogen
* - commensal and normally not a pathogen
** - Possibly a pathogen
*** - Pathogen
**** - Important pathogen, possibly fatal
The numbers shows the abundances of the species in each metagenomic read set. It is normally shown in percentages, unless parameter ‘–absolute’ is applied, which changes the numbers to be absolute read counts.
The last row of the output is a summary of all unknown/uncertain reads without taxonomic classifications.