transposonmapper.importing
transposonmapper.importing¶
- transposonmapper.importing.load_default_files(gff_file=None, essentials_file=None, gene_names_file=None)[source]¶
This function loads some files that have a recurrent use throughout the pipeline. It will look inside the satay/data_files folder for the files if the input is None. Otherwise it will return the same input file.
- Parameters
gff_file (.gff3, optional) – Annotated genome from Saccharomyces cerevisiae (baker’s yeast) (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.gff.gz), by default None
essentials_file (.txt, optional) – Essentials genes annotated from yeast , written using the systematic name standard , all in one column, by default None
gene_names_file (.txt, optional) – This documents lists all the Saccharomyces cerevisiae S288c entries present in this release of UniProtKB/Swiss-Prot. Yeast (Saccharomyces cerevisiae): entries, gene names and cross-references to SGD. Release: 2021_01 of 10-Feb-2021, by default None
- Returns
str – The correct path of the .gff3 input required file for the further analysis.
str – The correct path of the essential file input required file for the further analysis.
str – The correct path of the gene_name_file required file for the further analysis.
- transposonmapper.importing.load_sgd_tab(sgd_features_file=None)[source]¶
This function loads the file SGD_features.tab The latest version of the SGD_features.tab file is based on Genome Version R64-2-1. If a specific file is provided it will output that file , otherwise , if it is set to None then it will give the standard file provided in the package.
- Parameters
sgd_features_file (str, optional) – The latest version of the SGD_features.tab file is based on Genome Version R64-2-1., by default None
- Returns
The path corresponding to this file in the package
- Return type
str
- transposonmapper.importing.read_genes(gff_file, essentials_file, gene_names_file)[source]¶
This function reads the useful information inside the gff_file, essentials_file and gene_names_file. For the gff_file and essentials_file extracts the gene coordinates , specifying the chromosome, start ,end and direction. For the gene_names_files it translates the systematic name into the standard name.
- Parameters
gff_file (.gff3) – Annotated genome from Saccharomyces cerevisiae (baker’s yeast) (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.gff.gz)
essentials_file (.txt) – Essentials genes annotated from yeast , written using the systematic name standard , all in one column
gene_names_file (.txt) – This documents lists all the Saccharomyces cerevisiae S288c entries present in this release of UniProtKB/Swiss-Prot. Yeast (Saccharomyces cerevisiae): entries, gene names and cross-references to SGD. Release: 2021_01 of 10-Feb-2021
- Returns
dict – gene_coordinates : a dict specifying for each gene the chromosome number the gene belongs to, the start gene coordinate, the end gene coordinate and the strand direction (‘+’ or ‘-‘).
dict – essential_coordinates: a dict specifying for each annotated essential gene the chromosome number the gene belongs to, the start gene coordinate, the end gene coordinate and the strand direction (‘+’ or ‘-‘).
dict – aliases_designation: a dict that for each systematic gene name specify the standard gene name.