Genomics Use-case

    The genomics use-case involves multiple Faasm functions:

    • gene/mapper_index[1-n] - each of these functions handles the mapping for a different chunk of the index.

    There will be as many gene/mapperindex functions as there are chunks of the index. A basic division of the human genome will be into chromosomes, in this case we will have 24 mapper_index functions. These functions will _each get called once per chunk of reads data.

    You can download the genomics data then upload to your Faasm instance with:

    The genomics data is shared via Faasm's shared files rather than directly through shared state.

    To build the genomics library to WASM, build and upload the functions you can run:

    1. # Build
    2. ./bin/clean_genomics.sh
    3. ./bin/build_genomics.sh
    4.  
    5. # Upload
    6. inv upload.genomics
    7.  
    8. inv invoke gene mapper --input=1
    9.  
    10. # Invoke in a loop for all read chunks
    11. inv genomics.mapping

    First you need to install libfaasm natively:

    One that's set up, you can run the following:

    1. ./bin/build_genomics_native.sh

    The repo itself then describes how to use this code.

    The index and reads only need to be set up once and uploaded to S3. To do this you need a native build of the indexer (described above). Then you can run:

    Mapping

    To map a reads file you can do the following:

    1. ./bin/gem-mapper -I data/human_c_20_idx.gem -i data/reads_1.fq -o data/my_output.sam

    Lots of animal genomes at this FTP server.

    See the readme for the file layout. Can add more in the download_genome.py script.

    This page also has stuff: https://www.ensembl.org/Homo_sapiens/Info/Index (good for individual chromosomes).

    • Add a new native toolchain (Settings -> Build, Execution, Deployment -> Toolchains)
    • Add a new custom build target (along with a new build tool for make under the "build" field)
    • Have it run bin/gem-indexer with the relevant input/ output files

    Internals

    Mapping is handled through mapper.c which calls . For each thread it creates amapping_stats_t and a mapper_search_t.

    Threads are either a mapper_pe_thread or a mapper_se_thread, these are just differenttypes of mapping and also live in . mapper_se_thread is default.