transposonmapper.importing

transposonmapper.importing

transposonmapper.importing.load_default_files(gff_file=None, essentials_file=None, gene_names_file=None)[source]

This function loads some files that have a recurrent use throughout the pipeline. It will look inside the satay/data_files folder for the files if the input is None. Otherwise it will return the same input file.

Parameters
  • gff_file (.gff3, optional) – Annotated genome from Saccharomyces cerevisiae (baker’s yeast) (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.gff.gz), by default None

  • essentials_file (.txt, optional) – Essentials genes annotated from yeast , written using the systematic name standard , all in one column, by default None

  • gene_names_file (.txt, optional) – This documents lists all the Saccharomyces cerevisiae S288c entries present in this release of UniProtKB/Swiss-Prot. Yeast (Saccharomyces cerevisiae): entries, gene names and cross-references to SGD. Release: 2021_01 of 10-Feb-2021, by default None

Returns

  • str – The correct path of the .gff3 input required file for the further analysis.

  • str – The correct path of the essential file input required file for the further analysis.

  • str – The correct path of the gene_name_file required file for the further analysis.

transposonmapper.importing.load_sgd_tab(sgd_features_file=None)[source]

This function loads the file SGD_features.tab The latest version of the SGD_features.tab file is based on Genome Version R64-2-1. If a specific file is provided it will output that file , otherwise , if it is set to None then it will give the standard file provided in the package.

Parameters

sgd_features_file (str, optional) – The latest version of the SGD_features.tab file is based on Genome Version R64-2-1., by default None

Returns

The path corresponding to this file in the package

Return type

str

transposonmapper.importing.read_genes(gff_file, essentials_file, gene_names_file)[source]

This function reads the useful information inside the gff_file, essentials_file and gene_names_file. For the gff_file and essentials_file extracts the gene coordinates , specifying the chromosome, start ,end and direction. For the gene_names_files it translates the systematic name into the standard name.

Parameters
  • gff_file (.gff3) – Annotated genome from Saccharomyces cerevisiae (baker’s yeast) (https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.gff.gz)

  • essentials_file (.txt) – Essentials genes annotated from yeast , written using the systematic name standard , all in one column

  • gene_names_file (.txt) – This documents lists all the Saccharomyces cerevisiae S288c entries present in this release of UniProtKB/Swiss-Prot. Yeast (Saccharomyces cerevisiae): entries, gene names and cross-references to SGD. Release: 2021_01 of 10-Feb-2021

Returns

  • dict – gene_coordinates : a dict specifying for each gene the chromosome number the gene belongs to, the start gene coordinate, the end gene coordinate and the strand direction (‘+’ or ‘-‘).

  • dict – essential_coordinates: a dict specifying for each annotated essential gene the chromosome number the gene belongs to, the start gene coordinate, the end gene coordinate and the strand direction (‘+’ or ‘-‘).

  • dict – aliases_designation: a dict that for each systematic gene name specify the standard gene name.