Sunday, 31 May 2009

bioinformatics - Confusion related to the use of PCA to determine the background network

I am by far not an expert, but I read the paper and maybe I can help clarifing things a bit.



Let's start with the simplest answer to your question:
In the last paragraph of the Introduction paper the authors say




"Throughout this manuscript [...]'background networks' and principle components are used interchangeably." ([1] Torkamani and Schork, Genetic background and drug response, The Pharmacogenomics Journal (2012) 12, 446-452; doi:10.1038/tpj.2011.35)




So they did not "use" PCA to get the background networks, they just call the PCs background models to indicate how they interpret it: As stated earlier in the introduction




"...interacting networks cannot be expected to correlate strongly with drug response, as their influence may only be observed when the major determinant of drug response and the interacting network complement one another or are both at a synergistic state. A major problem with identifying these interaction partners [...] is the extremely large number of possible partners [...] and [...] that individual genes are unlikely to accurately represent the overall state of a biological network." [1]




As far as I understand the article, they suggest PCA as a kind of compromise: Ignoring interactions would miss associations of all networks lacking a single gene representing the network's state accurate enough. Including all interactions is infeasable due to the huge number of gene-pairs. By PCA, the number of interactions can be decreased by orders of magnitude while keeping a maximum on information (as in variation): Instead of using the interaction (as in product) of all probes with all probes, they only consider those of all probes with the first six PCs.



I think, in this context 'background' is not used in the meaning of 'background noise' but of 'cultural background' - instead of assesing individual interactions, genes are assigned a 'background'. Since this corresponds to a) the biological network(s) they belong to and b) the principle components(s) they contribute to, those two ideas are used synonymously by calling PCs 'background networks'.

human biology - Can someone explain the color-changing unit (CCU) to me?

I've been physically carrying out serial tenfold dilutions on samples of Ureaplasma to work out the color-changing units (CCU).



As a definition, the CCU is the highest dilution at which there is a color change.



If the highest dilution is 10^3, apparently that means that there are 10^3 cells per ml in the original sample. But this is what I do not understand. How are we sure of this?



I feel like there is a lot that I am missing.



Any help would be greatly appreciated.

Thursday, 28 May 2009

nutrition - Possible to Gain More Weight than the Food You Eat

No it is not possible. Humans are heterotrophic organisms, which means that we use organic molecules (i.e., food) as a source of nutrients and energy. We use the nutrients to add mass to our bodies. These nutrients are the familiar carbohydrates, proteins, lipids (fats), etc... During digestion food is broken down into simpler organic nutrient molecules that are then used to make our body tissues.



So even if we only used food for nutrients we could not gain more weight than the food we consume but note that the food is also used for energy. This means that some of the mass of the food that we eat is not used to add mass to the body but is "burned" as metabolic fuel. The mass of the food used for energy is expelled from the body as waste in the $CO_2$ that we exhale and in the metabolic waste in the urine.



The carbon in the air is mainly $CO_2$, which is an inorganic molecule. Only autotrophic organisms like plants can use inorganic molecules as a source of nutrients. Since inorganic molecules usually contain less potential energy than organic molecules autotrophic organisms need a different source of energy. Plants use sunlight.



Finally, fat contains about 9 Calories per gram whereas a Big Mac has about 2.4 Calories per gram so fat has almost 4 times the energy of a Big Mac.



(Calories are the amount of heat that is released if all of the energy in the organic molecules is released so it is an estimate of how much energy the body can get from food)

Sunday, 24 May 2009

human biology - What is the healing process of mouth wounds?

To build on the answers from @Armatus and @S-Sunil



The healing mechanism involves the inflammatory process, which is the same in almost the entire body. In particular in both skin and mucosa (both referred to as "epithelial" tissues), when there is a break, platelets and clotting factors clot off any bleeding vessels, white blood cells (neutrophils and macrophages in particular) collect and destroy any bacteria, dead cells and muck, and then the process of regeneration occurs (with ongoing inflammation), where stem cells in the surrounding tissue regrow cells, new blood vessels may be formed and scar tissue is laid down to give extra strength. Eventually after these stages of healing, there is "remodelling" where the structure basically gets better.



enter image description here



As to why oral wounds heal quickly and don't get infected that much? There are a bunch of reasons. One is that the head and neck has an excellent blood supply- just think of scalp wounds where you bleed like crazy but then they heal very well. Another is that mucous membranes have immune functions that stop invading microbes. Generally speaking we divide this into "innate" immunity which is a general response, and "adaptive" immunity which is tailored towards specific bugs. The mucosa has both. There are neutrophils and macrophages (innate) which live in that area, there are lymphoid patches (like lymph nodes, and adaptive), there are immunoglobulins specific to mucosa (IgA) which is adaptive in nature, and the epithelial lining cells themselves will signal to the rest of the immune system if there is damage or an infection. Plus, saliva itself has chemicals and enzymes which break down oral bugs.



Oral immunity



Most of the oral bacteria themselves are not particularly invasive. Just think, every time you brush your teeth, you cause multiple abrasions in your mouth. In fact, measurable amounts of bacteria from the mouth end up in your bloodstream every time you brush your teeth! And yet, we don't end up with bloodstream infections as a result. This is partially because the rest of our immune system (and the structure of our heart and vessels) is intact, and partially because oral bacteria are not very invasive or pathogenic. They kind of have a sweet deal living in your mouth minding their own business and not killing their host, and even causing infection in an oral wound would make their continued survival less likely.



In addition I should probably point out that skin has a great deal of bacteria on it as well, and yet we rarely get infections from them (unless colonised by an invasive species), for the same reasons.



colonisation, invasion, infection

Friday, 22 May 2009

pathology - What are some of the immediate challenges to break through before finding a cure for mad cow disease?

The main problem is that Mad Cow disease is not caused by a "normal" pathogen but by a prion, a protein.



Traditionally, disease causing agents can be classified into viruses, bacteria, fungi, and parasites. Bacteria, fungi and parasites are all living organisms, alive in the traditional sense. It is, therefore, possible to design drugs that kill them.



Viruses are trickier, while not really alive in the traditional sense, they still have to make copies of their genetic material (DNA, or RNA in the case of many viruses) in order to cause infection. Therefore, drugs like ganciclovir that stop the formation of viral DNA can be an effective treatment.



Prion based diseases such as Mad Cow disease, Creutzfeldt–Jakob disease and Kuru, are not caused by a classical infectious agent but by a simple protein, a prion. Prions are misfolded versions of proteins already present in the host's body. When a prion interacts with the host protein, it causes this protein to adopt the same, misfolded, state as the prion. Protein function depends directly on the protein's structure, the way it is folded in three dimensional space. Therefore, the misfolded protein can no longer carry out its physiological role and disease symptoms occur.



Now, since the prion is just a protein, one single molecule, there is no essential life process that we can disrupt. We have to attack them using chemical agents that target the prion while not affecting the healthy, correctly folded, host protein. This is very hard to do.



The blood-brain barrier mentioned by Larry_Parnell is another problem. Briefly, the BBB, is a kind of fence that only allows certain of the various items circulating in the blood stream to enter the organism's brain. This is a wonderful defensive feature but it makes it much harder to design drugs that can enter the brain and target pathogens found there.

Thursday, 21 May 2009

genetics - How do PLINK files and HapMap Phased files differ?

According to http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped:




The PED file is a white-space (space or tab) delimited file: the first six columns are mandatory:



 Family ID
Individual ID
Paternal ID
Maternal ID
Sex (1=male; 2=female; other=unknown)
Phenotype


[...]



Genotypes (column 7 onwards) should also be white-space delimited; they can be any character (e.g. 1,2,3,4 or A,C,G,T or anything else) except 0 which is, by default, the missing genotype character. All markers should be biallelic. All SNPs (whether haploid or not) must have two alleles specified. Either Both alleles should be missing (i.e. 0) or neither. No header row should be given. For example, here are two individuals typed for 3 SNPs (one row = one person):



 FAM001  1  0 0  1  2  A A  G G  A C 
FAM001 2 0 0 1 2 A A A G 0 0
...



And here is what I find in the begining of a HapMap .ped file I got a few years ago (hapmap3_r2_b36_fwd.YRI.qc.poly.ped):




 Y001    NA18488 0       0       2       -9      C C     T T     ...
Y014 NA18519 0 0 1 -9 C C T T ...
...



So far, it seems to me than this is plain .ped format: the number of "header" columns is the same, and seems to conform to the specifications given in the above-mentioned web page.



Now let's have a look at the .map files.




By default, each line of the MAP file describes a single marker and must contain exactly 4 columns:



 chromosome (1-22, X, Y or 0 if unplaced)
rs# or snp identifier
Genetic distance (morgans)
Base-pair position (bp units)


[...]



Note: Most analyses do not require a genetic map to be specified in any case; specifying a genetic (cM) map is most crucial for a set of analyses that look for shared segments between individuals. For basic association testing, the genetic distance column can be set at 0.



[...]



The autosomes should be coded 1 through 22. The following other codes can be used to specify other chromosome types:



 X    X chromosome                    -> 23
Y Y chromosome -> 24
XY Pseudo-autosomal region of X -> 25
MT Mitochondrial -> 26


The numbers on the right represent PLINK's internal numeric coding of these chromosomes: these will appear in all output rather than the original chromosome codes.




Here we have something that may be different.
The end of the .map file corresponding to the HapMap .ped file looks like this:




 26      rs28357376      0       15825
26 rs2853510 0 15925
26 rs2854125 0 16149



The HapMap .map file uses "plink's internal numeric coding" for the chromosome instead of the letter code (MT).



Otherwise, it looks a pretty standard .map file, with no genetic distance indicated.

Wednesday, 20 May 2009

biochemistry - Can Pfx polymerase add only one 3' A overhang?

I am trying to clone a PCR product that was amplified using Pfx polymerase into pGemT vector. I had to A-tail the PCR product using Taq polymerase since Pfx only generates blunt end products. My ligation reaction was successful, however when I got the sequences, there is an extra T between the 3'end-T of pGem and the start codon of my product. It only happens in the 3' end of pGemT (or 5' end of my product). Can Pfx polymerase generates PCR products with just one 3'end overhang? or Can Taq polymerase add two consecutive dATP during the A-tailing?



Please write me your thoughts ASAP, it has happened twice and it's a pain already :(



GGC CGC GGG A-T-T-T-G ATG GGA AGC ATG AAG



The two Ts in Bold belong to pGemT, the T Italic is the extra T I've got from the sequencing reaction. After the T in Italic you may find the 5'end of my insert. The second T (in bold) from pGemT is the one that ligates with the overhang A added through the A-tailing protocol.

Monday, 18 May 2009

botany - Can Palisade Cells Survive Independently?

I have been intrigued by this question. Can palisade cells survive independently from its parent plant in a chemical environment? For example, if we were to separate a palisade cell from a plant and place it in sucrose solution, how long would it survive?

ecoli - What makes E. coli yellowish?

Bobthejoe's comment is the best answer so far. Despite many other types of bacterial colonies being "more" yellow than E. coli, E. coli is definitely not white.



Flavins, especially riboflavins, are the predominant compound responsible for this coloration.

Sunday, 17 May 2009

homework - What range of dose should be used?

In the most basic sense you want to kill the most cancerous cells whilst minimizing the regular somatic cell death. Almost all cancer medications affect regular cells, too - though the better ones do so at a minimum whilst being effective. In reality, it's also nearly impossible to kill all of the cancerous cells. The goal is to bring them below detectable levels, which can allow the body to finish the job. Leaving significant amounts of cancerous cells alive won't do the patient any good - they'll just continue to proliferate and the patient will be back for more operations or treatments soon.



So, with the goal of minimizing benign cell cost and completely eradicating the cancerous line, on your crude chart that falls at about "4".

Friday, 15 May 2009

cancer - What does "tumour budding" mean?

tumour budding, lymphocytic infiltration and resection margins are established factors that influence the outcome of colorectal cancer (1)



In this context what does "tumour budding" mean?



Reference



(1) A. Bolocan, D. Ion, D.N. Ciocan, D.N Paduraru. Chirurgia (2012) 107:555-563. Introduction/0

Thursday, 14 May 2009

molecular biology - Question about equilibrium potential formula

That quasi-travesty is the Nernst equation in $\log_{10}$ for a positive monovalent ion at physiological temperatures (37 degrees celsius), but they've hidden all that from you. Shame on them.



The canonical form of the Nernst equation, for an ion $S$ is



$$
E_{S} = \frac{RT}{z_{S}F}\ln{\frac{[S]_{out}}{[S]_{in}}}
$$



where $R$ is the gas constant, $T$ is temperature expressed in Kelvin, $F$ is Faraday's constant, and $z_S$ is the charge of ion $S$. This is the actually useful form of the equation that can be used for any ion for any temperature.



The Nernst equation is a limiting case of the Goldman-Hodgkin-Katz equation for a single ion. The Nernst equation is useful for directly determining the equilibrium potential of a single ionic species. The GHK equation is used for determining the reversal potential of a membrane or channel in multi-ion cases.



Note that the terms "reversal potential" and "equilibrium potential" are not synonymous, except in the case of a single ion system. The reversal potential is where the direction of current switches. An equilibrium is when net ion flux is zero (although, in a living cell, it is more appropriate to call even this situation a steady state). If membrane were only permeable to one ion, the reversal potential will be at the equilibrium potential (given by Nernst) for that single ion. However, when there are several permeant ions, usually none of the ions will be in equilibrium at the reversal potential--that is, all the ions will have a measurable flux across the membrane even though the sum of those fluxes (weighted by permeabilities) is zero at the reversal potential.




Converting natural log to log base 10, you can use the identity $\log_b{a} = \frac{\log_{10}{a}}{\log_{10}{b}}$, which gives



$$
E_{S} = \frac{1}{\log_{10}{e}} \cdot \frac{RT}{z_{S}F}\log_{10}{\frac{[S]_{out}}{[S]_{in}}}
$$



In the case of sodium ions (Na+) at 37 degrees (physiological temperature), $T$ is 310 and $z_S$ is +1. Substituting in these values, we have



$$
E_{S} = 2.303 \cdot \frac{(8.3145) (310)}{(+1) (96485) }\log_{10}{\frac{[S]_{out}}{[S]_{in}}} \\
E_{S} = 61.5 \text{ mV} \cdot\log_{10}{\frac{[S]_{out}}{[S]_{in}}}
$$




YAK uses a less common form of the Nernst equation in membrane physiology which uses the Boltzmann constant directly and deals with elementary charges instead of moles ($R = N_A k_B$). It is a wonderful exercise to derive the Nernst equation from the Boltzmann equation,



$$
\frac{p_2}{p_1} = \exp{(-\frac{u_2 - u_1}{k_BT})}
$$



where $p_i$ is the probability of a particle being in state $i$, and $u_i$ is the energy of state $i$. All it takes is some rearranging and keeping careful track of the units.

Tuesday, 12 May 2009

hearing - While someone's ears pops with pressure, can he/she hear other sound at the same time?

Yes, of course they can. What happens when your ears feel 'full' like on an aeroplane is that the air pressure in the middle ear is different from the air pressure outside. When you 'pop' your ears, you push open the Eustachian tubes that connect the middle ear to the throat and make the pressure equal. No matter what the air pressure, the air still conducts sounds. The difference is that your ear drum moves a bit less if the middle ear pressure and outside pressure is different, thus why sound is muffled before you pop your ears.



Eustachian tubes

rna - Are there any strictly chloroplast/mitochondrion-residing ribozymes?

You've already mentioned rRNA. An interesting review (Tanner, 2006), outlines some more:



  • Group I Introns - self-splicing; "They are abundant in fungal and plant mitochondria..."

  • Group II Introns - some have been shown to self-splice; "...they are found in fungal and plant mitochondria, in chloroplasts of plants ... and especially in the chloroplasts of the protist Euglena gracilis."

  • RNase P - involved in tRNA processing, found in mitochondria

None of these are strictly found in mitochondria or chloroplasts. However, the paper goes on to describe VS RNA ribozyme:




The mitochondria of certain strains of Neurospora contain the Varkud plasmid (a retroplasmid), which encodes a reverse transcriptase, and a small, unrelated, RNA (VS RNA). The VS RNA is transcribed from circular or multimeric VS plasmid DNA by a mitochondrial RNA polymerase, and the resulting transcripts are subsequently site-specifically cleaved and ligated to form circular, 881 nucleotides long, RNA monomers. These monomers are then reverse transcribed and made double stranded to form the mature VS plasmid.



In vitro transcribed VS RNA precursors are cleaved and ligated by the RNA itself and this is presumed to occur in vivo as well. Of all the self-cleaving RNAs, the catalytic properties of VS RNA are the most poorly understood.





References




Tanner NK. 2006. Ribozymes: the characteristics and properties of catalytic RNAs. FEMS Microbiol Rev. 23(3):257-275.


Monday, 11 May 2009

bioinformatics - How to calculate extent of Sequence similarity

The first thing that comes on my mind is to use cross-correlation (CCF).
Essentially you compare one trace with variously shifted version of the other to see if there is a correlation between them.



For example (I am using R but you should be able to adapt this to your software of choice, I have added comments), say A and B are very similar, but shifted of a certain amount in the x axis (10 units in the example) and C is extremely different



# Set the random seed to get a reproducible example
set.seed(12345)
# Number of points per trace
n <- 1000
# All of the possible sensor values
values <- seq(0, 330, 30)
# Sample with replacement to get 100 random values
A <- sample(values, n, replace=TRUE)
# Let B = A shifted by 10 positions and then change one value every 5
B <- c(A[-1:-10], A[1:10])
B[seq(0, n, 5)] <- sample(values, n/5, replace=TRUE)
# C is a completely different trace
C <- sample(values, n, replace=TRUE)
# Plot the traces (I'm offsetting B and C just for visual clarity)
plot(A, t="l", col="red", lwd=2, ylim=c(0, 1200))
points(B + 360, t="l", col="green", lwd=2)
points(C + 720, t="l", col="blue", lwd=2)

# Now calculate the CCF
c.AB <- ccf(A,B, 100)
c.BC <- ccf(B,C, 100)
c.AC <- ccf(A,C, 100)

# Superimpose the CCF plots
plot(c.AB$lag, c.AB$acf, t="o", col="green", ylim=c(-0.5,1), ylab="CCF", xlab="Lag")
points(c.BC$lag, c.BC$acf, t="o", col="red")
points(c.AC$lag, c.AC$acf, t="o", col="blue")
abline(v=10, col="grey", lty=3)
legend("topleft", c("A-B", "B-C", "A-C"), col=c("green", "red", "blue"), lty=1, lwd=2, pch=20)


The CCF graphs look like this:
CCF plot



This graphs means that there is a strong positive correlation (max correlation = 1, here you have 0.8) between A and B and that they are shifted of 10 units. You can see this because the peak is at lag=10, corresponding to the gray dashed line, so the maximum correlation is when you shift trace B by 10 units.



B and C and A and C are instead uncorrelated.

Friday, 8 May 2009

botany - What is the cause of the spots on this leaf?

This is a "Tar Spot" disease usually found in Europe and North America. It mostly affects the Maple tree leaves. Tar spot is caused by 'Rhytisma acerinum' a plant pathogen fungus. This pathogen does not seem harm to tree but disturbs the leaves as it finds a suitable condition in summer with bit of wetness. It enters the leaves through stoma and then creates yellow lesions of various sizes over the leaf area which later gradually turns into brown-black tar colored spot. It reduces the photosynthesis process of leaves and thus creating more wide dark spots on the leaves. After sometime the leaves will fall.



Entire detail is available in the Wikipedia link with the detail of the pathogen. (Add some more info if you find this detail not enough)

Thursday, 7 May 2009

homework - Recent and good quality articles on systems biology

I'll have a go at a short list. This is more of a highlight and primer sort of response to a somewhat subjective question so pardon if this isn't what you wanted. I don't have paper references for all this, but I'll try to come back and add some later - I usually only have half an hour or so to write an answer so bear with me.



Synthetic biology as defined as the use of genes and promoters to engineer cells or even multicellular organisms like devices is a major category. I like browsing the projects from the iGem Competitions. A couple of outstanding efforts in this field are the use of photoreceptors to program E coli gene expression and the general effort to create logic and computational circuits in the cell.



A major application in systems biology is engineering cells to produce new chemical compounds or to overproduce compounds. A classic example (sort of old, but still pretty outstanding) is the engineering of E coli to produce the antimalarial drug artemisinin at levels which would enable world wide release of the drug. This has inspired efforts to produce fuel from algae and bacteria as well.



If you define Systems biology as being often concerned with modeling some or even all the biological processes of the cell, there are probably too many major efforts to cite, but this is just a list of some favorites.



There are lots of papers focused on flux balance analysis which models how the metabolic machinery processes and synthesizes all the scores of chemical compounds that make up a cell.



Another category of synbio tries to take the genes found in a genome and model the actions of all the genes. One of the most exciting papers to come out in this sort of systems biology is the whole cell model of Mycobacterium genetalium. Using data from a massive effort to characterize every gene in this very small genome, the model consists of over 20 specific models which have combined to make a very impressive simulation of the entire cell dividing.



Then there is metagenomics, which looks at the different populations of microorganisms and tries to look at how they vary with different environmental conditions. A recent paper that's exciting is the review of how different bacteria dominate in the gut of obese mice and people (several references in the link). A tour de force in metagenomics was the global sea survey.

Sunday, 3 May 2009

bioinformatics - Standard letter for 5-methylcytosine

Well, don't use M or B, those are already taken (C or A, and not A, respectively). You can see the full list here: http://www.dna.affrc.go.jp/misc/MPsrch/InfoIUPAC.html (The enWiki article on Nucleobases lists a few others but I would ignore those as 1. D is present in both and 2. they are rare and inapplicable)



5-methylcytosine isn't on there. If you want to be pedantic about it, 5-methylcytosine is an epigenetic marker and as such is by definition not a genetic sequence; that remains simply a C and, genetically, the sequence is the same, despite the fact that it may indeed make a difference.



Most of the time people use m5C, so I'd go with 5 if I were you. That certainly isn't used for anything else and if you must use a single character most anybody will know what you are talking about.

Saturday, 2 May 2009

biophysics - How to quantitatively measure work done by a biological system?

Measuring the work done by a biological system seems pretty impossible. Imagine how many different ways one cell of your body uses energy (ATP). You can't really measure all the work done by every cell on a macro scale. Metabolic efficiency has been defined as... "health". That seems just a little ambiguous. That's why we use things like averages to determine if energy use is normal or not, such as in metabolic age.



In short, work is a more tangible term in discrete physics examples, but there is so much complicated energy use in biological systems that total systemic work can't be easily defined.

evolution - What does fitness really mean?

Fitness is certainly the most important concept in the theory of evolution. My question does not have to do with practical measures of fitness but with the theoretical definition of it.



I am a bit lost with the concept of fitness. Below I give some possible definitions of fitness and I EXPECT critizes for these definitions. What is the definition of fitness?



In the following, to make things easier I will consider only one bi-allelic locus. As I said I don't want to talk practical issues but theoretical ones, therefore we will assume to work on a panmictic population of inifinite size evenly distributed into class age.



1) Let the variable $M$ be the mean time of a generation in the species of interest. Measure the allele frequency $p(t)$ and measure the frequency $p(t+M)$. By comparing the two you get the ratio of fitness of the two different genotypes. Obviously it does not work if we have sexual reproduction or diploid selection.



2) Genotype the individuals and wait until they die by counting the number of offspring they had and take the mean per genotype ($W_{AA}$, $W_{Aa}$, $W_{aa}$). This method would not work if they have different lifespan. One individual might make two babies in 8 years and the other would make two babies in 2 years but they would have he same fitness.



3) Do the same that at 2) but don't wait until they die, wait 1 month, Or 2 years?, or $M$?, or $2\cdot M$?, or...? What is the right decision? The more you wait, the closest you will be to the long term effect of natural selection. The less you wait, the more probable you will suffer of genetic drift.



4) Fitness is just a measure of natural selection that is not perfectly accurate because it is measured at short-term. In some circumstances, our measure on this short term is representative (Wright-Fisher equation) enough of what will happen on the long term.

genetics - How does a new species survive without suffering inbreeding?

This is indeed a very good question, that I have spent a long time thinking about myself. My take on this is that there is indeed a very close relationship between inbreeding and speciation, but that inbreeding actually PRECEDES speciation ! The key to this rather counter-intuitive point of view is that inbreeding actually has several advantages, including that of resulting in cleaner genomes in those offspring that survive inbreeding depression.



In populations that have high degrees of inbreeding, because of, for example, small sizes of populations, or high tendencies to self fertilisation, the recessive mutations that cause inbreeding depression will be progressively eliminated, and there will consequently be very little inbreeding depression.



In such a context, there will thus be no barrier to exploiting other advantages of inbreeding such as reducing the cost of sex, or keeping together advantageous gene combinations. Small groups of individuals would thus be better off breeding among one another than with the ancestral stock, leading to speciation.



If you are interested, or just intrigued, by this kind of concept, I invite you to read the rather long essay I wrote on the subject (The existence of species rests on a metastable equilibrium between inbreeding and outbreeding. An essay on the close relationship between speciation, inbreeding and recessive mutations, Etienne Joly, http://www.biologydirect.com/content/6/1/62 ).
And please do not hesitate to get in touch with me directly by email ( atnjoly(at)mac.com ) if you have comments or any further questions after reading this essay.