transposonmapper.mapping

transposonmapper.mapping

transposonmapper.mapping.add_chromosome_length(coordinates, chr_lengths_cumsum, ref_tid_roman)[source]

This function returns a dictionary that for every gene , there is the chromosome number information of where the gene belongs to , the coordinates for the start position, the end position and the direction of the gene.

Parameters
  • coordinates (dict) – This dictionary is the 1st output from the function read_genes(gff_file, essentials_file, gene_names_file)

  • chr_lengths_cumsum (dict) – This dictionary is the 2nd output from the function get_sequence_length(bam)

  • ref_tid_roman (dict) – This dictionary is the output from ref_tid_roman = {key: value for key, value in zip(ref_romannums, ref_tid)}, where ref_romannums = chromosomename_roman_to_arabic()[1] and ref_tid = get_chromosome_names(bam)

Returns

A dictionary that for every gene , there is the chromosome number information of where the gene belongs to , the coordinates for the start position, the end position

Return type

dict

and the direction of the gene.

transposonmapper.mapping.add_chromosome_length_inserts(coordinates, ref_names, chr_lengths)[source]

For each insertion location, add the length of all previous chromosomes

Parameters
  • numpy.array – Third output from get_reads(bam)

  • list

  • following (Output from the) –

    ref_tid = get_chromosome_names(bam)

    ref_names = list(ref_tid.keys())

  • dict – First output from get_sequence_length(bam)

Returns

For each insertion location, add the length of all previous chromosomes

Return type

numpy.array

transposonmapper.mapping.correct_read_position(flags, start_position, readlength)[source]

Correct starting position for reads with reversed orientation

Parameters
  • flags (numpy.array) – [description]

  • start_position (numpy.array) – [description]

  • readlength (numpy.array) – [description]

Returns

start position and flags corrected

Return type

numpy.array

transposonmapper.mapping.find_chromosome_reads(chromosome, N_reads: int)[source]

Find the reads inside each chromosome

Parameters
  • chromosome (str) – The name of the chromosome , either in arabic numbers 1, 2 , or roman “I”, “II”

  • N_reads (int) –

Return type

flags, start_position, readlength

transposonmapper.mapping.get_insertions_and_reads(coordinates, tn_coordinates, readnumb_array)[source]

This function computes the total number of transposons per gene , the number of reads per gene and the distribution of transposons along the gene.

Parameters
  • coordinates (dict) – This is the output of the function add_chromosome_length(coordinates, chr_lengths_cumsum, ref_tid_roman)

  • tn_coordinates (numpy.array) – This is the output of the function add_chromosome_length_inserts(coordinates, ref_names, chr_lengths)

  • readnumb_array (numpy.array) – This is the 1st output of the function get_reads(bam)

Returns

  • dict – A dict which every key corresponds with each gene and each value with the total number of transposons found in that gene

  • dict – A dict which every key corresponds with each gene and each value with the total number of reads for all the transposons found in that gene

  • dict

    A dict which every key corresponds with each gene and each value with a list of 4 elements:
    • the chromosome number

    • gene start position

    • gene end position

    • distribution of reads per transposon found inside the gene

transposonmapper.mapping.get_reads(bam)[source]

This function retrieves all reads within a specified genomic region.

Parameters

bam (The output for the function pysam.AlignmentFile(bamfile, "rb")) –

Returns

  • numpy.array – reads per genomic region

  • numpy.array – Array of three columns where the 2nd one indicated the start position where there was a transposon

  • numpy.array – A copy from the 2nd output