|
SeqAn3 3.4.0
The Modern C++ library for sequence analysis.
|
Learning Objective:
In this tutorial, you will learn how to combine the components of previous tutorials to create your very first SeqAn application: a read mapper!
| Difficulty | High |
|---|---|
| Duration | 90 Minutes |
| Prerequisite tutorials | All |
| Recommended reading |
Read mapping is a common task in bioinformatics and is often the first step of an in-depth analysis of Next Generation Sequencing data. It aims to identify positions where a query sequence (read) matches with up to e errors to a reference sequence.
In this example, we will implement a read mapper step by step and make use of what we have learned in the previous tutorials. As it is common practice with read mappers, we will first create an indexer that creates an index from the reference and stores it to disk. After this, we will implement the actual read mapper that will use the stored index and map the reads.
We provide an example reference and an example query file.
As a first step, we want to parse command line arguments for our indexer. If you get into trouble, you can take a peek at the Argument Parser Tutorial or the API documentation of the seqan3::argument_parser for help.
Follow the best practice and create:
Use validators where applicable!
Your main may look like this:
As a next step, we want to use the parsed file name to read in our reference data. This will be done using the seqan3::sequence_file_input class. As a guide, you can take a look at the Sequence I/O Tutorial.
To accomplish this, you should create:
You should also perform the following changes in run_program:
This is the signature of read_reference:
This is the reference_storage_t:
Here is the complete program:
Now that we have the necessary sequence information, we can create an index and store it. Read up on the Index Tutorial if you have any questions.
We also need to change:
This is the signature of create_index:
Here is the complete program:
Again, we want to parse command line arguments for our read mapper as a first step. If you get into trouble, you can take a peek at the Argument Parser Tutorial or the API documentation of the seqan3::argument_parser for help.
Follow the best practice and create:
Use validators where applicable!
Your main may look like this:
We also want to read the reference in the read mapper. This is done the same way as for the indexer. We can now load the index and conduct a search. Read up on the Search Tutorial if you have any questions.
To accomplish this, you should:
You should also perform the following changes in run_program:
This is the signature of map_reads:
Here is the complete program:
We can now use the obtained positions to align each query against the reference. Refer to the Alignment Tutorial if you have any questions.
This is the alignment config:
Here is the complete program:
Finally, we can write our results into a SAM file.
Try to write all available information into the SAM file. We can introduce a naive mapping quality by using mapping quality = 60 + Sequence Alignment score.
This is the sam_file_output construction:
Here is the complete program: