Genomics Use-case
The genomics use-case involves multiple Faasm functions:
gene/mapper_index[1-n]
- each of these functions handles the mapping for a different chunk of the index.
There will be as many gene/mapperindex
functions as there are chunks of the index. A basic division of the human genome will be into chromosomes, in this case we will have 24 mapper_index
functions. These functions will _each get called once per chunk of reads data.
You can download the genomics data then upload to your Faasm instance with:
The genomics data is shared via Faasm's shared files rather than directly through shared state.
To build the genomics library to WASM, build and upload the functions you can run:
- # Build
- ./bin/clean_genomics.sh
- ./bin/build_genomics.sh
- # Upload
- inv upload.genomics
- inv invoke gene mapper --input=1
- # Invoke in a loop for all read chunks
- inv genomics.mapping
First you need to install libfaasm natively:
One that's set up, you can run the following:
- ./bin/build_genomics_native.sh
The repo itself then describes how to use this code.
The index and reads only need to be set up once and uploaded to S3. To do this you need a native build of the indexer (described above). Then you can run:
Mapping
To map a reads file you can do the following:
- ./bin/gem-mapper -I data/human_c_20_idx.gem -i data/reads_1.fq -o data/my_output.sam
Lots of animal genomes at this FTP server.
See the readme for the file layout. Can add more in the download_genome.py
script.
This page also has stuff: https://www.ensembl.org/Homo_sapiens/Info/Index (good for individual chromosomes).
- Add a new native toolchain (Settings -> Build, Execution, Deployment -> Toolchains)
- Add a new custom build target (along with a new build tool for
make
under the "build" field) - Have it run
bin/gem-indexer
with the relevant input/ output files
Internals
Mapping is handled through mapper.c
which calls . For each thread it creates amapping_stats_t
and a mapper_search_t
.
Threads are either a mapper_pe_thread
or a mapper_se_thread
, these are just differenttypes of mapping and also live in . mapper_se_thread
is default.