Kraken

Taxonomic classification tool that uses exact k-mer matches to find the lowest common ancestor (LCA) of a given sequence

https://ccb.jhu.edu/software/kraken/

The MultiQC module supports outputs from both Kraken and Kraken 2.

It works with report files generated using the --report flag, that look like the following:

66       98148   98148   U       0       unclassified
34       743870  996     -       1       root
22       742867  0       -       131567    cellular organisms
22       742866  2071    D       2           Bacteria
95       740514  2914    P       1239          Firmicutes

A bar graph is generated that shows the number of fragments for each sample that fall into the top-5 categories for each taxa rank. The top categories are calculated by summing the library percentages across all samples.

The number of top categories to plot can be customized in the config file:

kraken:
  top_n: 5

File search patterns

kraken:
  contents_re: ^\s{0,2}(\d{1,3}\.\d{1,2})\t(\d+)\t(\d+)\t((\d+)\t(\d+)\t)?([URDKPCOFGS-]\d{0,2})\t(\d+)(\s+)unclassified
  num_lines: 1

File search patterns​

File search patterns