The subject invention leverages data sampling techniques to provide an efficient means to determine co-occurrence count estimations for objects and features from relational data, simplifying measure-of-association determinations. By providing an efficient mechanism to estimate co-occurrence counts, instances ...
Sampling method for estimating co-occurrence countsChristopher A. MeekCarl M. Kadie
Params: corpus (list of list of strings): corpus of documents window_size (int): size of context window Return: M (a symmetric numpy matrix of shape (number of unique words in the corpus , number of unique words in the corpus)): Co-occurence matrix of word counts. The ordering of ...
The package implements the method introduced in Wiegand and Hennessey et al. (2017a). It identifies significant difference in co-occurrence counts for a given node or set of nodes across two corpora, using a Fisher’s Exact test. A good place to start is the ‘Introduction to CorporaCoCo’...
As mentioned, to achieve parallelism in the matrix building code the collection of input files can be processed in parallel with each parallel step independently outing its collection of product pairs and co-occurrence counts. This is achieved using an Array.Parallel.map function call. This maps ...
Interestingly, non-codon region has lesser counts of homogeneous motifs (0-0) than heterogeneous motifs (0-1, 0-2 and 0-3) (Fig. 4). In other words, non-codon region favoured more number of short range co-mutations with codon regions. In line particularly, TNMs were screened among non...
column. In Fig.1, the microbiome composition data set is represented by a matrixN×Dof counts (abundance) of bacteria, where each column represents a different type of bacteria (taxon) and each row represents a different sample. Table 1 Publicly available microbiome composition datasets...
Taxon-taxon counts at high taxonomic ranks were assessed for overrepresentation significance using the hypergeometric distribution implemented by stats::phyper. Mutual exclusion versus co-presence analysis was performed using the binomial distribution implemented by stats::pbinom, with background probability ...
These calculations are performed according to scheme represented above in Table 1, but MCOT counts either CEs with more conserved Anchor or Partner motifs, i.e. either -Log10[ERR(Anchor)] > -Log10[ERR(Partner)] or -Log10[ERR(Anchor)] ≤ -Log10[ERR(Partner)]. Finally, to estimate the...
For example, a hypothetical patient with condition C always counts towards the general population prevalence of C, but only contributes to PCEHR if and only if the patient has a recorded diagnosis for C in the medical records. We discuss the differences between EHR prevalence and general ...