transposonmapper.mapping¶

transposonmapper.mapping.add_chromosome_length(coordinates, chr_lengths_cumsum, ref_tid_roman)[source]¶

This function returns a dictionary that for every gene , there is the chromosome number information of where the gene belongs to , the coordinates for the start position, the end position and the direction of the gene.

Parameters

coordinates (dict) – This dictionary is the 1st output from the function read_genes(gff_file, essentials_file, gene_names_file)
chr_lengths_cumsum (dict) – This dictionary is the 2nd output from the function get_sequence_length(bam)
ref_tid_roman (dict) – This dictionary is the output from ref_tid_roman = {key: value for key, value in zip(ref_romannums, ref_tid)}, where ref_romannums = chromosomename_roman_to_arabic()[1] and ref_tid = get_chromosome_names(bam)

Returns

A dictionary that for every gene , there is the chromosome number information of where the gene belongs to , the coordinates for the start position, the end position

Return type

dict

and the direction of the gene.

transposonmapper.mapping.add_chromosome_length_inserts(coordinates, ref_names, chr_lengths)[source]¶

For each insertion location, add the length of all previous chromosomes

Parameters

numpy.array – Third output from get_reads(bam)
list –
following (Output from the) –
ref_tid = get_chromosome_names(bam)

ref_names = list(ref_tid.keys())
dict – First output from get_sequence_length(bam)

Returns

For each insertion location, add the length of all previous chromosomes

Return type

numpy.array

transposonmapper.mapping.correct_read_position(flags, start_position, readlength)[source]¶

Correct starting position for reads with reversed orientation

Parameters

flags (numpy.array) – [description]
start_position (numpy.array) – [description]
readlength (numpy.array) – [description]

Returns

start position and flags corrected

Return type

numpy.array

transposonmapper.mapping.find_chromosome_reads(chromosome, N_reads: int)[source]¶

Find the reads inside each chromosome

Parameters

chromosome (str) – The name of the chromosome , either in arabic numbers 1, 2 , or roman “I”, “II”
N_reads (int) –

Return type

flags, start_position, readlength

transposonmapper.mapping.get_insertions_and_reads(coordinates, tn_coordinates, readnumb_array)[source]¶

This function computes the total number of transposons per gene , the number of reads per gene and the distribution of transposons along the gene.

Parameters

coordinates (dict) – This is the output of the function add_chromosome_length(coordinates, chr_lengths_cumsum, ref_tid_roman)
tn_coordinates (numpy.array) – This is the output of the function add_chromosome_length_inserts(coordinates, ref_names, chr_lengths)
readnumb_array (numpy.array) – This is the 1st output of the function get_reads(bam)

Returns

dict – A dict which every key corresponds with each gene and each value with the total number of transposons found in that gene
dict – A dict which every key corresponds with each gene and each value with the total number of reads for all the transposons found in that gene
dict –
A dict which every key corresponds with each gene and each value with a list of 4 elements:
- the chromosome number
- gene start position
- gene end position
- distribution of reads per transposon found inside the gene

transposonmapper.mapping.get_reads(bam)[source]¶

This function retrieves all reads within a specified genomic region.

Parameters

bam (The output for the function pysam.AlignmentFile(bamfile, "rb")) –

Returns

numpy.array – reads per genomic region
numpy.array – Array of three columns where the 2nd one indicated the start position where there was a transposon
numpy.array – A copy from the 2nd output

SATAY pipeline at Delft :)

transposonmapper.mapping

transposonmapper.mapping¶