Reference

ce_detector.detector

class for detecting junction reads

class ce_detector.detector.JunctionDetector(bam_file, reference, quality, output=None)[source]

class for detecting junction reads and record position

Parameters
  • bam_file (str) – bam file

  • output (str) – filename of output

  • reference (str) – filename of genome reference

  • quality (int) – quality for filtering junction reads

static check_strand(anchor, acceptor)[source]

check type of strand

Parameters
  • anchor (str) – anchor of read

  • acceptor (str) – acceptor of read

Returns

type of strand (-|+)

Return type

str

run(chrom, ann_chrom, logger, verbose=False)[source]

detect junction reads and annotate slice site, write results to file

Returns

instance from junctionmap

Return type

instance

worker(bam_file, reference, chrom, ann_chrom, quality, idn, junctionmap)[source]

find junction reads and annotate slice site

Parameters
  • ann_chrom

  • bam_file (instance) – handle of bam_file

  • reference (instance) – handle of reference

  • chrom (str) – chromosome

  • quality (int) – quality for filtering reads

  • idn (int) – identifier of reads

  • junctionmap (instance) – instance from junctionmap

Returns

instance from junctionmap

Return type

instance

class ce_detector.detector.JunctionMap(chrom)[source]

build a class to store information of all junction reads

add_read(read)[source]

add read to junctionlist

Parameters

read (instance) – instance from Read

write2file(output, header=None)[source]

write all reads in junctionlist to file

Parameters
  • output (str or TextIo) – file name of output

  • header (str) – header of output

class ce_detector.detector.Read(chrom, start, end, idn, score, strand, anchor, acceptor, pvalue)[source]

build a read class for storing information of every junction read

Parameters
  • chrom (str) – chromosome of genome

  • start (int) – start position of junction read

  • end (int) – end position of junction read

  • idn (int) – index of junction read

  • score (int) – support of junction read

  • strand (str) – direction of junction read (-|+)

  • anchor (str) – anchor of junction read

  • acceptor (str) – acceptor of junction read

ce_detector.annotator

class for annotating junction reads

class ce_detector.annotator.Annotator(database: Any, output=None)[source]

annotate junction reads

Parameters
  • junctionmap (instance) – instance return ce_detector.detector.JunctionMap

  • database (Any) – database of annotation files

  • output (TestIo) – filename of annotated junction reads. Defaults to None

annotate_junction(read, result, db)[source]

annotate junction reads and write results to file

Parameters
  • read (instance) – junction read return ce_detector.detector.Read

  • result (defaultdict[Any, Any]) – gene list used for annotation

  • db (instance of file) – database of annotation file

static detect_property(start, end, junction_list)[source]

detect type of slice, number of skipped donors and number of skipped acceptors

type of slice including D A DA N NDA

Parameters
  • start (int) – start of junction read

  • end (int) – end of junction read

  • junction_list (numpy.array) – gene list of junction reads

Returns

type of slice, number of skipped donors, number of skipped acceptors

run(junctionmap, logger, verbose=False)[source]

main function used to annotate junction reads

pick all genes covered by one junction read and annotate all of them: type of slice, number of skipped donors and number of skipped acceptors

ce_detector.scanner

class for scanning cryptic exons based on previous _result :: Junction detector and junction annotator

class ce_detector.scanner.Scanner(cutoff, output=None)[source]

class for scanning cryptic exons based on annotated junction reads

Parameters
  • cutoff (int) – cutoff used to filter junction reads with relatively low score or depth

  • output (str) – filename of _result

run(junctionmap, logger, verbose=False) → Iterable[source]

run program to scan cryptic exons

Parameters
Returns

temporary result used to store cryptic exons

Return type

Iterable

write2file(logger, verbose=False)[source]

start iterator and write _result to file

ce_detector.scanner.assign_value(df_ce, ces, ns, ce_id, ns_id) → None[source]

assign value of child column for every cryptic exons that contains junction reads with N type

Given the start and end of cryptic exons and junction reads, Note: types of junction reads contains N, D, A, DA, NDA. For details: https://regtools.readthedocs.io/en/latest/commands/junctions-annotate/

Parameters
  • df_ce (pandas.DataFrame) – pandas.DataFrame of cryptic exons

  • ces (pandas.DataFrame) – cryptic exons’ pandas.DataFrame, which has gene ids that both cryptic exon and junction read own. It only contain gene id, start and end

  • ns (pandas.DataFrame) – pandas.DataFrame of junction reads whose type is N, as well as has same gene id as cryptic exons

  • ce_id (numpy.array) – index of gene id of cryptic exons which has junction reads as children

  • ns_id (numpy.array) – index of gene id of junction reads with N type, which have junction reads as children

ce_detector.scanner.check(axis) → numpy.array[source]

check if cryptic exon has children

Which means that check if the cryptic exon is split by others junction reads in terms of start and end position

Parameters

axis (numpy.array) – an array, a junction read, included start and end

Returns

whether junction read can split cryptic exon

Return type

bool

ce_detector.scanner.find_ce(groups) → Iterable[source]

parse _result getting from annotations in order to detect cryptic exons

Parameters

groups (Groupby object return from Dataframe.groupby) – annotated junction reads are grouped by strand and type

Returns

pd.DataFrame of two strands

Return type

Iterable

ce_detector.scanner.split_ce(df_ce, df_n) → Iterable[source]

Iterator: check whether detected cryptic exons are split by other junction reads

Parameters
  • df_ce (pandas.DataFrame) – pandas.DataFrame of cryptic exons return from find_ce()

  • df_n (pandas.DataFrame) – pandas.DataFrame of junction reads with N type

Returns

param:df_ce with new column children

Return type

iterator