The Histone Code
The mechanisms involved in Histone mediated influence on transcription of genes are the essence of biomatics. The similarities to man-made digital computation are striking. It is clear that biologists must understand these principles in order to gain a full understanding of DNA functioning. While the input to the Histone complex may indeed be like a “code”, the Histone complex itself is a complex “Finite State Machine”, and beyond. It may be characterized as a powerfull microprocessor with an n-bit input bus, with n equal to the number of modifiable sites on the histone tails (n is currently about 40-50 with more sites being discovered). This circuit is made up of the 20 elemental algebraic structures...the amino acids (The Amino Acid Code). Each amino acid has a unique logical design. They may be characterized as each representing not only a unique topology, but as a collection of dynamic topologically changing electrical circuits with many layers of processing ongoing. And like snowflakes, no two histone complexes in the body are exactly alike. But with time and the application of current as well as yet to be discovered technologies these mysteries shall be solved.
The above image was generated by a simulation of a polypeptide chain fixed at one end, a paintbrush on the other end and bonds allowed to rotate freely with a Fibonacci ratio (5:3). See Image Gallery for more examples. This is a situation similar to the "unstructured" Histone tails. Note the similarity to DaVinci's Vitruvian Man
Histone tails are the most common sites of post-translational modifications. Tail modifications alter both inter and intra nucleosomal interactions to disrupt the condensed chromatin structure, thereby playing crucial role in gene access. The Histone Tail Codes
Epigenetic Control Systems
CpG islands are genomic regions that contain a high frequency of CG dinucleotides. In mammalian genomes, CpG islands are typically 300-3,000 base pairs in length. They are in and near approximately 40% of promoters of mammalian genes (about 70% in human promoters). The "p" in CpG notation refers to the phosphodiester bond between the cytidine and the guanosine.
CpG islands are characterized by CpG dinucleotide content of at least 60% of that which would be statistically expected (~4-6%), whereas the rest of the genome has much lower CpG frequency (~1%), a phenomenon called CG suppression. Unlike CpG sites in the coding region of a gene, in most instances, the CpG sites in the CpG islands of promoters are unmethylated if genes are expressed. This observation led to the speculation that methylation of CpG sites in the promoter of a gene may inhibit the expression of a gene. Methylation is central to imprinting alongside histone modifications.
Histones undergo posttranslational modifications which alter their interaction with DNA and nuclear proteins. The H3 and H4 histones have long tails protruding from the nucleosome which can be covalently modified at several places. Modifications of the tail include methylation, acetylation, phosphorylation, ubiquitination, sumoylation, citrullination, and ADP-ribosylation. The core of the histones (H2A and H3) can also be modified. Combinations of modifications are thought to constitute a code, the so-called "histone code". Histone modifications act in diverse biological processes such as gene regulation, DNA repair and chromosome condensation (mitosis).
The common nomenclature of histone modifications is as follows:
- The name of the histone (e.g H3)
- The single letter amino acid abbreviation (e.g. K for Lysine) and the amino acid position in the protein
- The type of modification (Me: methyl, P: phosphate, Ac: acetyl, Ub: ubiquitin)
So H3K4Me denotes the methylation of H3 on the 4th lysine from the start (N-terminal) of the protein.
Protein folded states are kinetic hubs
Scale Free Networks
Using a Web crawler, physicist Albert-Laszlo Barabasi and his colleagues at the University of Notre Dame in Indiana in 1998 mapped the connectedness of the Web. They were surprised to find that the structure of the Web didn't conform to the then-accepted model of random connectivity. Instead, their experiment yielded a connectivity map that they christened "scale-free."
Barabasi and his team had been doing work that modeled surfaces in terms of fractals, which are also scale-free. Scale-free networks have been used to explain behaviors as diverse as those of power grids, the stock market and cancerous cells, as well as the dispersal of sexually transmitted diseases.
Put simply, the nodes of a scale-free network aren't randomly or evenly connected. Scale-free networks include many "very connected" nodes, hubs of connectivity that shape the way the network operates. The ratio of very connected nodes to the number of nodes in the rest of the network remains constant as the network changes in size.
Gene Cluster Dynamics
The above diagram represents a (Cayley table) Transition Algebra of Histone modifications. Given an initial histone state S0 at time T0 for some histone mediated process, define a Path as: (A,B)1,(A,B)2,...(A,B)n, where the above table specifies the state transition for the case of a single amino acid with 4 dynamic 2 or 3 state sites based on rotations about covalent bonds (e.g. lysine). Arginine has the most potential sites with five. Given an initial state of an amino acid circuit the above table can be used as a transitional algebra to record or program the device. The top row represnts codification of the state change which is inflicted on the current state as represented by the first column.
Finite State Machine
In computation, a finite state machine (FSM) is event driven if the creator of the FSM intends to think of the machine as consuming events or messages. This is in contrast to the parsing-theory origins of the term finite-state machine where the machine is described as consuming characters or tokens.
Often these machines are implemented as threads or processes communicating with one another as part of a larger application. For example, an individual car in a traffic simulation might be implemented as an event-driven finite-state machine.
Mealy and Moore Machines
In the theory of computation, a Moore machine is a finite state automaton where the outputs are determined by the current state alone (and do not depend directly on the input). The state diagram for a Moore machine will include an output signal for each state. Compare with a Mealy machine, which maps transitions in the machine to outputs.
Math, Biology and Music: Fugue
In music, a fugue (pronounced /ˈfjuːg/) is a type of contrapuntal composition or technique of composition for a fixed number of parts, normally referred to as "voices", irrespective of whether the work is vocal or instrumental. In the Middle Ages, the term was widely used to denote any works in canonic style; by the Renaissance, it had come to denote specifically imitative works. Since the 17th Century the term fugue has described what is commonly regarded as the most fully developed procedure of imitative counterpoint. A fugue opens with one main theme, the subject, which then sounds successively in each voice in imitation; when each voice has entered, the exposition is complete; usually this is followed by a connecting passage, or episode, developed from previously heard material; further "entries" of the subject then are heard in related keys. Episodes and entries are usually alternated until the "final entry" of the subject, by which point the music has returned to the opening key, or tonic, which is often followed by closing material, the coda.  In this sense, fugue is a style of composition, rather than fixed structure. Though there are certain established practices, in writing the exposition for example, composers approach the style with varying degrees of freedom and individuality.
The form evolved during the 17th century from several earlier types of contrapuntal compositions, such as imitative ricercars, capriccios, canzonas, and fantasias. Middle and late Baroque composers such as Dietrich Buxtehude (1637–1707) and Johann Pachelbel (1653–1706) contributed greatly to the development of the fugue, and the form reached ultimate maturity in the works of Johann Sebastian Bach (1685–1750). With the decline of sophisticated contrapuntal styles at the end of the baroque period, the fugue's popularity as a compositional style waned, eventually giving way to Sonata form. Nevertheless, composers from the 1750s to the present day continue to write and study fugue for various purposes; they appear in the works of Mozart (e.g. Kyrie Eleison of the Requiem in D minor) and Beethoven (e.g. end of the Credo of the Missa Solemnis), and many composers such as Anton Reicha (1770–1836) and Dmitri Shostakovich (1906–1975) wrote cycles of fugues.
The English term fugue originates in the 16th century and is derived from either the French or Italian fuga, which in turn comes from Latin, also fuga, which is itself related to both fugere (‘to flee’) and fugare, (‘to chase’). The adjectival form is fugal. Variants include fughetta (literally, 'a small fugue') and fugato (a passage in fugal style within another work that is not a fugue).
Learning to recognize and apply templates such as the hypercube can greatly simplify the task of designing and implementing parallel programs.
The hypercube communication template allows information to be propagated among P tasks in just log P steps. Each algorithm considered in this case study has exploited this property to perform some form of all-to-all communication. For example, in matrix transposition each task requires values from every other task; in sorting, the position of each value in the final sequence depends on all other values. Many other parallel algorithms can be naturally formulated in terms of the same template, once the need for all-to-all communication is recognized.
The three steps of the O(log P) matrix transpose algorithm when P=N=8.
In mathematics, in matrix theory, a permutation matrix is a square (0,1)-matrix that has exactly one entry 1 in each row and each column and 0's elsewhere. Each such matrix represents a specific permutation of m elements and, when used to multiply another matrix, can produce that permutation in the rows or columns of the other matrix.
In mathematics, especially in probability and combinatorics, a doubly stochastic matrix (also called bistochastic), is a square matrix of nonnegative real numbers, each of whose rows and columns sums to 1. Thus, a doubly stochastic matrix is both left stochastic and right stochastic
Stochastic Matrix n mathematics, a stochastic matrix, probability matrix, or transition matrix is used to describe the transitions of a Markov chain. It has found use in probability theory, statistics and linear algebra, as well as computer science. There are several different definitions and types of stochastic matrices;
In mathematics, a Markov chain, named after Andrey Markov, is a discrete-time stochastic process with the Markov property. Having the Markov property means that, given the present state, future states are independent of the past states. In other words, the present state description fully captures all the information that can influence the future evolution of the process. Thus, given the present, the future is conditionally independent of the past.
Latin hypercube sampling
The statistical method of Latin hypercube sampling (LHS) was developed to generate a distribution of plausible collections of parameter values from a multidimensional distribution. The sampling method is often applied in uncertainty analysis.
In the context of statistical sampling, a square grid containing sample positions is a Latin square if (and only if) there is only one sample in each row and each column. A Latin hypercube is the generalisation of this concept to an arbitrary number of dimensions, whereby each sample is the only one in each axis-aligned hyperplane containing it.
When sampling a function of N variables, the range of each variable is divided into M equally probable intervals. M sample points are then placed to satisfy the Latin hypercube requirements; note that this forces the number of divisions, M, to be equal for each variable. Also note that this sampling scheme does not require more samples for more dimensions (variables); this independence is one of the main advantages of this sampling scheme. Another advantage is that random samples can be taken one at a time, remembering which samples were taken so far.
Orthogonal sampling adds the requirement that the entire sample space must be sampled evenly. Although more efficient, orthogonal sampling strategy is more difficult to implement since all random samples must be generated simultaneously.
In two dimensions the difference between random sampling, Latin Hypercube sampling and orthogonal sampling can be explained as follows:
- In random sampling new sample points are generated without taking into account the previously generated sample points. One does thus not necessarily need to know beforehand how many sample points that are needed.
- In Latin Hypercube sampling one must first decide how many sample points to use and for each sample point remember in which row and column the sample point was taken.
- In Orthogonal Sampling, the sample space is divided into equally probable subspaces, the figure above showing four subspaces. All sample points are then chosen simultaneously making sure that the total ensemble of sample points is a Latin Hypercube sample and that each subspace is sampled with the same density.
Thus, orthogonal sampling ensures that the ensemble of random numbers is a very good representative of the real variability, LHS ensures that the ensemble of random numbers is representative of the real variability whereas traditional random sampling (sometimes called brute force) is just an ensemble of random numbers without any guarantees.
Monte Carlo Integration
In order to integrate a function over a complicated domain , Monte Carlo integration picks random points over some simple domain which is a superset of , checks whether each point is within , and estimates the area of (volume, -dimensional content, etc.) as the area of multiplied by the fraction of points falling within .
Cellular Potts model
The cellular Potts model is a lattice-based computational modeling method to simulate the collective behavior of cellular structures. Other names for the CPM are extended large-q Potts model and Glazier and Graner model. First developed by James Glazier and Francois Graner in 1992 as an extension of large-q Potts model simulations of coarsening in metallic grains and soap froths, it has now been used to simulate foam, biological tissues, fluid flow and reaction-advection-diffusion-equations. In the CPM a generalized "cell" is a simply-connected domain of pixels with the same cell id (formerly spin). A generalized cell may be a single soap bubble, an entire biological cell, part of a biological cell, or even a region of fluid.
The CPM is evolved by updating the cell lattice one pixel at a time based on a set of probabilistic rules. In this sense, the CPM can be thought of as a generlized cellular automaton (CA). Although it also closely resembles certain Monte Carlo methods, such as the large-q Potts model, many subtle differences separate the CPM from Potts models and standard spin-based Monte Carlo schemes.
The primary rule base has three components:
- rules for selecting putative lattice updates
- a Hamiltonian or effective energy function that is used for calculating the probability of accepting lattice updates.
- additional rules not included in 1. or 2..
The CPM can also be thought of as an agent based method in which cell agents evolve, interact via behaviors such as adhesion, signalling, volume and surface area control, chemotaxis and proliferation. Over time, the CPM has evolved from a specific model to a general framework with many extensions and even related methods that are entirely or partially off-lattice.
The central component of the CPM is the definition of the Hamiltonian. The Hamiltonian is determined by the configuration of the cell lattice and perhaps other sub-lattices containing information such as the concentrations of chemicals. The original CPM Hamiltonain included adhesion energies, and volume and surface area constraints. We present it here without definition as an illustration and will discuss it in greater detail later:
Circadian clocks rhythmically coordinate biological processes in resonance with the environmental cycle. The clock function relies on negative feedback loops that generate 24-h rhythms in multiple outputs. In Arabidopsis thaliana, the clock component TIMING OF CAB EXPRESSION1 (TOC1) integrates the environmental information to coordinate circadian responses. Here, we use chromatin immunoprecipitation as well as physiological and luminescence assays to demonstrate that proper photoperiodic phase of TOC1 expression is important for clock synchronization of plant development with the environment. Our studies show that TOC1 circadian induction is accompanied by clock-controlled cycles of histone acetylation that favor transcriptionally permissive chromatin structures at the TOC1 locus. At dawn, TOC1 repression relies on the in vivo circadian binding of the clock component CIRCADIAN CLOCK ASSOCIATED1 (CCA1), while histone deacetylase activities facilitate the switch to repressive chromatin structures and contribute to the declining phase of TOC1 waveform around dusk. The use of cca1 late elongated hypocotyl double mutant and CCA1-overexpressing plants suggests a highly repressing function of CCA1, antagonizing H3 acetylation to regulate TOC1 mRNA abundance. The chromatin remodeling activities relevant at the TOC1 locus are distinctively modulated by photoperiod, suggesting a mechanism by which the clock sets the phase of physiological and developmental outputs.
Histone Stoichiometry and Metabolic Flux Analysis
Stoichiometric network analysis
Stoichiometric approaches make use of reaction stoichiometry when trying to determine metabolic pathways. In contrast, path-finding approaches propose an alternative view based on graph theory in which reaction stoichiometry is not considered. Path-finding approaches use shortest path and k-shortest path concepts.