transposonmapper.mapping
transposonmapper.mapping¶
- transposonmapper.mapping.add_chromosome_length(coordinates, chr_lengths_cumsum, ref_tid_roman)[source]¶
This function returns a dictionary that for every gene , there is the chromosome number information of where the gene belongs to , the coordinates for the start position, the end position and the direction of the gene.
- Parameters
coordinates (dict) – This dictionary is the 1st output from the function read_genes(gff_file, essentials_file, gene_names_file)
chr_lengths_cumsum (dict) – This dictionary is the 2nd output from the function get_sequence_length(bam)
ref_tid_roman (dict) – This dictionary is the output from ref_tid_roman = {key: value for key, value in zip(ref_romannums, ref_tid)}, where ref_romannums = chromosomename_roman_to_arabic()[1] and ref_tid = get_chromosome_names(bam)
- Returns
A dictionary that for every gene , there is the chromosome number information of where the gene belongs to , the coordinates for the start position, the end position
- Return type
dict
and the direction of the gene.
- transposonmapper.mapping.add_chromosome_length_inserts(coordinates, ref_names, chr_lengths)[source]¶
For each insertion location, add the length of all previous chromosomes
- Parameters
numpy.array – Third output from get_reads(bam)
list –
following (Output from the) –
ref_tid = get_chromosome_names(bam)
ref_names = list(ref_tid.keys())
dict – First output from get_sequence_length(bam)
- Returns
For each insertion location, add the length of all previous chromosomes
- Return type
numpy.array
- transposonmapper.mapping.correct_read_position(flags, start_position, readlength)[source]¶
Correct starting position for reads with reversed orientation
- Parameters
flags (numpy.array) – [description]
start_position (numpy.array) – [description]
readlength (numpy.array) – [description]
- Returns
start position and flags corrected
- Return type
numpy.array
- transposonmapper.mapping.find_chromosome_reads(chromosome, N_reads: int)[source]¶
Find the reads inside each chromosome
- Parameters
chromosome (str) – The name of the chromosome , either in arabic numbers 1, 2 , or roman “I”, “II”
N_reads (int) –
- Return type
flags, start_position, readlength
- transposonmapper.mapping.get_insertions_and_reads(coordinates, tn_coordinates, readnumb_array)[source]¶
This function computes the total number of transposons per gene , the number of reads per gene and the distribution of transposons along the gene.
- Parameters
coordinates (dict) – This is the output of the function add_chromosome_length(coordinates, chr_lengths_cumsum, ref_tid_roman)
tn_coordinates (numpy.array) – This is the output of the function add_chromosome_length_inserts(coordinates, ref_names, chr_lengths)
readnumb_array (numpy.array) – This is the 1st output of the function get_reads(bam)
- Returns
dict – A dict which every key corresponds with each gene and each value with the total number of transposons found in that gene
dict – A dict which every key corresponds with each gene and each value with the total number of reads for all the transposons found in that gene
dict –
- A dict which every key corresponds with each gene and each value with a list of 4 elements:
the chromosome number
gene start position
gene end position
distribution of reads per transposon found inside the gene
- transposonmapper.mapping.get_reads(bam)[source]¶
This function retrieves all reads within a specified genomic region.
- Parameters
bam (The output for the function pysam.AlignmentFile(bamfile, "rb")) –
- Returns
numpy.array – reads per genomic region
numpy.array – Array of three columns where the 2nd one indicated the start position where there was a transposon
numpy.array – A copy from the 2nd output