Tuesday, June 24, 2014

Seguin Moreau

Remy Petit and I met today with colleagues at Seguin Moreau, one of the largest wine-barrel producers in France. Wine barrels are made of white oak, in particular of Quercus petraea, which is higher in whiskey lactone than Quercus robur (the other common French oak) and American white oak (Q. alba). The day was fascinating from the standpoint of the oaks themselves and the barrel making. The photos below show only some of the highlights of barrel-making. Any errors are my own, as this text has not been reviewed.


Andrei and Remy inspecting incoming wood.
Wood that comes to the factory stands outside for several years (about 1 year for every centimeter of thickness). The alternation between saturation and drying draws secondary compounds out to the surface of the board, where they are washed away by the rain. The thickers boards here are used for large fermenting vats (see below), while the thinner boards...




... are used for standard wine barrels. The foam visible in the water running off of the wood to the left is caused by the tannins.







To keep the wood at an appropriate moisture level for barrel construction, flats of wood are stored under a large, open-air roof. You can see in the topmost board on the middle stack here a little warp; this is fine, and is caused by the fact that the boards of Q. petraea are cut to follow the grain, to ensure water-tightness. This is not necessary in Q. alba, which has larger tyloses. As a consequence, only about 25% of the original log makes it to this stage in Q. petraea, whereas about 60% of the Q. alba log makes it (to my recollection; these numbers could be a bit off).






Once inside, the boards are shaped into staves, and the staves are lined up on a barrel-sized pattern. There is about 1mm of wiggle room in this process.

Batches of staves are then assembled into barrels, which are open at the bottom, pounded into alignment, and then moved off to the next stage...



... where they are toasted over an open flame. During toasting, the bottoms of the barrels are hugged into shape


Toasting
and hugging.


 The resultant barrel smells like warm bread inside. It's delicious.

The edges of the barrel are then beveled, and a rim is cut to accept the top.


The barrel tops are made of a variety of woods, including Robinia pseudo-acacia, depending on the wine variety. The wood is pressed together with reeds between the boards to ensure that the top is airtight. This is the only monocot I noticed in the factory.

















The tops are cut round and pounded into place, with a sort of bread dough (truly! flour and water) between the top and the barrel proper to ensure a tight seal.

 The barrel is sanded smooth...



... laser-printed with the Seguin Moreau logo...

 ... pressure tested twice to ensure air-tightness, and then packaged up for shipping.










The largest barrels are truly enormous. As I understand it, the barrels below are used for fermenting chambers.









Thursday, April 3, 2014

FST distributions, between vs. within clades

The pattern of among-clade divergence becomes more interesting when I compare the clades. What I did here is I looked at FST within the white oaks, splitting all the white oaks (exc. for Virentes) from the Mexican / AZ clade; and FST within the red oaks, splitting all the red oaks except for Q. palustris and pals from the Mexican / AZ clade. These were the outcomes I imagined, and why I suspected each of them:

  1. If diversification is neutral, then I expect conserved regions of the genome to be, on average, shared between the white and red oaks. This would be reflected by a correlation in FST within the white oaks compared to FST within the red oaks.
  2. If diversification is driven by divergent selection on some regions of the genome but not others, I might find either:

    a. positive correlation, if the same regions of the genome were under divergent selection during diversification of the major white oaks clades as were under selection during divergence of the major red oak clades; or

    b. negative correlation, if white oak divergence was driven by selection at different loci than red oak divergence.
I subsetted the ca. 33000 RADseq loci from my last clustering analysis based on the criteria that loci were present in at least 5 members of each of the two white oak clades or at least 5 members of the two black oak clades, and variable at least at one nucleotide position. Then I mapped these all back to the Q. robur SNP map, 800-bp contigs, as described here: 2014-03-10. It is not the same set of markers that successfully maps in each case. Here are the markers that map in the two groups:

FST by linkage group within Quercus section Lobatae,
where the two populations are the eastern North American
red oaks (excluding Q. palustris and allies) and the Mexcian /
Arizonian red oaks.

FST by linkage group within Quercus section Quercus,
where the two populations are the eastern North American
red oaks (excluding Virentes and the Roburoids)
and the Mexcian / Arizonian red oaks.

I looked first to see whether there is any kind of correlation on a locus-by-locus basis, but this is noisy:

Biplot of FST within section Quercus (y axis) against section
Lobatae (x axis)

But we have a map! binning by 3 cM, and averaging FST within those bins, still gives a pretty noisy plot:


But, remarkably, the story seems to be a lot cleaner at bin sizes of 30 cM or higher:


Map position of divergence in 30 cM windows,
red oaks and white oaks, with all 12 linkage
groups concatenated.
Biplot, FST of red oaks on FST of
white oaks, 30 cM windows.
Pearson's r = -0.583, P = 0.0028.













This effect I was expecting to potentially show up at fine scales, due to selection at the scale of individual genes. In both groups we are looking at a cladogenetic split that is probably > 10 million years old (the split between the red and white oaks is about 30 million years old). Is there any chance, though, that we are really picking up on a strong divergence in the selective pressures driving divergence at the bases of the red and white oak clades, and that affected divergence patterns across 1/2-chromosome size blocks of the genome? 

Friday, March 28, 2014

Where is the divergence? v1

Okay, here's where I had reached last week:

This is a map of FST (y-axis) against map position... of course, what it reflects is mostly just low-divergence (low phylogenetic informativeness) loci (near zero) vs loci that do a good job of distinuishing the reds and the whites. A fixation index is probably not what I want for this project. It does, however, show us what loci are strongly divergent. I wonder what this would look like if we mapped (1) the red-white break, (2) a major within-white break, and (3) a major within-red break on the same plot?

Okay: let me bin these individuals into clades.

Monday, March 10, 2014

How much overlap?

354 loci in the last RAD dataset have at least 10 red oaks and 10 white oaks and are mapped to the Quercus robur contigs.

RAD map v1

Last week I mapped the RAD sequences back to the oak contigs. After clustering the contigs, there are 3327 unique contigs, of which 23 were removed because they either don't have a map position or map to two positions > 3 cM distant. So we now have 803 RAD loci mapped to 587 Quercus robur contigs... that's a lot farther along than we were two weeks ago. To do the first four days this week:

  • Quantify divergence between any two groups of individuals defined on the tree for mapped loci
  • Plot divergence on this map
  • Relate this map to the Q. rubra map
  • Start analyzing the fuller set of RADs on the Morton server
  • work with Isabella on mapping the larger contigs... status?

Then Jeanne comes on Thursday to start working together on the Q. rubra / Q. robur map.


Tuesday, March 4, 2014

RADami on CRAN

http://cran.r-project.org/web/packages/RADami/index.html

Notes on Quercus types at P – 2014-02-18

This is a late post! Just to get these in the record b/f I move on to the next thing...

Quercus excelsa Liebm. Isotype : P00754092
Quercus falcata Michx. Isotype: P00754029 (C.H. Mueller, 1958) ; I don’t know the variability in this species, but this looks to me to be a mixed collection. Also listed as isotypes on the P catalog, but not indicated as such on the sheets: P00754031—33.
Quercus triloba Michx. 3 isotypes : P00754026, 8 (C.H. Mueller, 1958), unbarcoded (anonymous det.) ; 1 unspecified type: P00754027 (C.H. Muller, 1958).
These have a look intermediate between Q. falcata and Q. marilandica.
*** Synonym of Q. falcata Michx. (Muller dets.)
Quercus galeottii Mart. 2 isotypes : P00754036, 7.
Quercus furfuracea Liebm. 2 types (unspecified) : P00754034, 5.
Quercus germana v. echidiea Trel. 1 isotype: P00754016.
*** Previously det. as Quercus subsquarossa [?] A.Camus, by A. Camus.
Quercus ghiesbreghtii Mart. & Gal. 2 isotypes: P00754017, 8.
Quercus glabrescens Benth. Isotype: P00754019.
Quercus conjungens Trel. Isotype: P00754025
*** Synonym of Q. glaucoides Mart. & Gal., following Govaerts and Frodin 1999.
* Filed under Q. glaucoides.
Quercus glaucescens H. & B. Isotype: P00129726 (Bonpland collection, with written notes)
Quercus oerstediana Liebm. var. crenifolia Trel. Type (unspecified): P00754020.
*** Ghiesbreght label (1842) dets as Q. synthetic Trel., and indicates this is a synonym of Q. oerstediana Liebm. non R.Br. var. crenifolia Trel.
*** Filed under Q. glaucescens.
Quercus glaucophylla V.Seem. Isotype : P00754024 (Breedlove, 1987)
*** Synonym of Q. glaucoides Mart. & Gal. (Breedlove, 1987)
Quercus greggii (A.DC.) Trel.
Quercus revoluta var. dysophyllopsis Trel. 2 isotypes : P00754100, 1. (A. Camus)
*** Synonym of Q. greggii (A.DC.) Trel. (Breedlove, 1987)
Quercus loesenerii Trel. Isotype: P00754099.
*** Synonym of Q. greggii (A.DC.) Trel. (Breedlove, 1987)
Quercus grahamii Benth. var. brevipes Trel. isotype : P00754098 (Breedlove, 1987).
Quercus hahnii Trel. Isotype: P00754102
Quercus hartwegii Benth. Isotype: P00754093.
*** Synonym of Q. obtusata Bonpl., following Govaerts and Frodin 1999.
Quercus humboldtii Bonpl. Isotype: P00129753
Quercus tolimensis H.&B. Isotype : P00129751.
*** Synonym of Q. humboldtii (C.H. Muller, 1958).
Quercus lindenii A.deC. Isotype: P00754103
*** Synonym of Q. humboldtii (C.H. Muller, 1958).
Quercus impressa Trel. Isotype: P00754104
Quercus cinerea var. humilis Michx. 2 isotypes: P00754106, 7 (C.H. Muller, 1958).
* Filed under Q. incana Michx.
*** Q. cinerea Michx. is a synonym of Q. incana Michx. (Tropicos)
*** Det as Q. pumila Walt., Q. cinerea humilis Michx. (C.H. Muller, 1958).
*** Quercus cinerea Michx. v. humilis (Walter) A.DC. is the ref in Tropicos.
*** Authorship unclear
Quercus cinerea Michx. 4 isotypes: P00754108—11.
* Filed under Q. incana Michx.
*** Q. cinerea Michx. is a synonym of Q. incana Michx. (Tropicos)
Quercus insignis M. & G. Isotype : P00754113
Quercus intricata Trel. 2 types (unspecified and perhaps depauperate ; how small are the leaves on this sp. ?) : P00754114, 5.
*** Originally det as Q. microphylla Née by Pringle.
Quercus laeta Liebm. var heterophylla Trel. Type (unspecified) : P00754119. Det as Q. laeta f. [forma? deliberately left blank?] by Trelease, 1913.
Quercus jaralensis Trel. 2 isotypes : P00754116, 7.
Quercus lanceolata H.&B. 2 types (unspecified) : P00129724, 5 (the first a Bonpland collection with notes), 1 isotype: P00129719 (Breedlove, 1987; det as Q. laurina H.&B., Muller 1958)
*** synonym for Q. laurina H.&B. (Breedlove, 1987).
Quercus lancifolia Cham. & Schl. 1 possible lectotype : P00754121; 1 lectotype : P00754120 (Breedlove 1987); det A. Camus and there designated “Type”
Quercus lanigera Mart. & Gal. Isotype: P00754122.
*** Breedlove (1987) annotated other sheets in this folder as Q. castanea Née, but someone came along and annotated them to Q. lanigera M&G (nomenclatural).
Quercus lecomteana Trel. 2 isotypes : P00754127, 8 (Trelease, 1913) ; 1 type : P00754126 (Trelease, 1913).
*** Breedlove (1987) annotated to Q. repanda H&B (nomenclatural).
Quercus laurina H.&B. 3 isotypes (Bonpland 4143: P00129720,1,2); 1 type: P00129723.
Quercus leiophylla A.DC. Isotype : P00754131.
Quercus leiophylla A.DC. f. subintegra Trel. Isotype : P00754130 (Breedlove, 1987).
*** both of the above det to Q. lancifolia S.&C. (Breedlove 1987), then back Q. leiophylla A.DC. following Govaerts and Frodin 1999.
Quercus olivaeformis Michx. Type (unspecified and unscanned) – Photo taken.
*** Synonym of Quercus macrocarpa
Quercus mollis Mart. & Gal. Type ?: P00754134 (Trelease 1913). But annotations by Trelease and C.H. Muller (1958) suggest that in fact Galeotti 104 is the isotype (this collection is Galeotti 103).
*** Annotated to Q. crassifolia H.&B. (Breedlove 1987), then to Q. chicamolensis Trel. following Govaerts and Frodin 1999.
Quercus mexicana f. lanosa Trel. Isotype : P00754133 (Breedlove 1987). Det Trelease, 1913
*** Det to Q. crassipes (C.H. Muller 1958, Breedlove 1987), then there is a synonym det slip to Q. mexicana H.B.K. following Govaerts and Frodin 1999.
Quercus mexicana H.&B. 3 isotypes : P00129716, 7, 8 (duplicates of Bonpland 4060).
Quercus panduriformis Trel. 2 types (indicated on one of labels, and these are duplicates, but not scanned or barcoded or stamped “TYPE”).
*** Both det to Q. magnoliaefolia Née (Muller 1958).
Quercus obovalifolia Fourn. 3 isotypes: P00754135, 6, 7.
* Filed under Q. oblongifolia Torr.
*** WCSP gives Q. crassipes as the accepted name for this species.

Quercus obtusata H.&B. 3 isotypes: P00129713, 4, 5.

Monday, February 17, 2014

Notes on Quercus types at P – 2014-02-17

** indicates refiling done or needed, or notes on filing locations


Quercus cubana. Two types of Q. cubana at P (P00754079, P 0075080), and I see that WCP lists Q. sagraeana Nutt. as a synonym of Q. cubana A.Rich., with Q. cubana having an 1841 publication date (where Q. sagraeana has a publication date of 1841). Presumably Nixon cleared this up in his dissertation, but I don't have access to that here. Is there any risk that this is still an open question? If it's tied up, we should let the folks at Tropicos and WCPs know. If not, we should address it briefly in the paper (this isn't the paper to deal with such things, but it needs to be addressed if there is any ambiguity).
Quercus longifolia Liebm. 2 syntypes (P00754040, P00754041). Filed under Q. acatenangensis Trel.
Quercus acherdophylla Trel. 3 isotypes (P00754042—4). Correct name fide Tropicos: Q. salicifolia Née
Quercus affinis M.Martins & Galeotti non Scheid. 1 isotype (P00754132). Synonym for Q. martensiana Trel. Рtangled synonymy fide Tropicos: also Q. peduncularis N̩e. and others. See Breedlove, D.E. 1987, Monographic studies of Neotropical Quercus (Cal Acad?). This specimen is clearly a white oak, while Q. affinis Scheidw. is a red oak.
** Separated Q. affinis M.Martins & Galeotti from Q. affinis Scheidw.
Quercus nitens M.Martins & Galeotti. Isotype : P00754045 [** Filed under Q. affinis Scheidw.]
Quercus nitens subintegra A.DC. Isotype : P00754046 [** Filed under Q. affinis Scheidw.]
Quercus agilops [sic.]. There were two folders labelled with this name, mostly filled with Q. montana. There are four homonyms of Q. aegilops out there, but all reference Cerris spp. ** Refiled.
Quercus ambigua H.B.K. 1 type: P00129745. Indicated on sheet as synonym of Q. rugosa Née.
Quercus alpescens Trel. Isotype: P00754039.
Quercus axillaris E.Fourn. ex Trel. Det on sheet as Q. castanea Née by Mulller; this is accepted name fide Plant List. Isotype : P00754038.
** Moved the two Q. axillaris together into one folder ; same collection, filed under both Q. axillaris and Q. castanea.
Quercus barbinervis Benth. 1 isotype : P00754123. Indicated on sheet to be synonym of Q. laurina H.&B. (by Breedlove, 1986).
Quercus benthamii A.DC. 1 isotype: P 00754047. Originally det by Hartweg as Quercus undulata Benth.
Quercus bourgeai Gerst. 3 isotypes, all duplicates of M. Bourgeau 1013: P00754048—50; also, one unmounted and unaccessioned sheet of the same collection (photographed). Indicated on sheet to be a synonym of Q. laurina H.&.B. (Breedlove, 1986).
Quercus bourgeai Gerst. var. ilicifolia Trel. 1 isotype: P00754051. Indicated on sheet to be synonym of Q. laurina H.&B. (Breedlove, 1986).
Quercus boyacanus Cuatr. 1 isotype: P00754052.
Quercus brandegei Goldm. 2 sheets, no collecting date, but old. Only to “Basse-Californie”, by M.L. Diguet, HERB. MUS. PARIS. Probably a date could be assigned.
Quercus canbyi Trel. 2 isotypes: P00754053, 4. Originally det. as Q. graham Benth. by C.G. Pringle.
------ below this point, Breedlove dets are consistently indicated; above, inconsistent ------
Quercus canbyi Trel. forma berlandieri Trel. Isotype : P00754055. (Breedlove 1987)
Quercus candicans Née f. michoacana Trel. Isotypes, all duplicates of Pringle 3955 : P00754059—61. (Breedlove 1987)
Quercus acuminata M.Martins & Galeotti. Isotype: P00754056. Filed under Q. candicans Née and listed as illegitimate in The Plant List.
Quercus alamo Bentham. Type (unspecified) : P00754057. Filed under Q. candicans Née
Quercus intermedia M.Martins & Galeotti. Isotype : P00754058. Filed under Q. candicans Née
Quercus candolleana Trel. 2 isotypes: P00744195, 6. Det as Q. acutifolia Née (Breedlove 1987). Later synonymized (on sheet) to Q. vexans Trel following Govaerts and Frodin 1999.
Q. carmenensis C.H.Mull. 1 type (unspecified): P00754062.
Q. rossii f. arsenii Trel. Isotype: P00754063. Synonymized to Q. diversifolia Née (Breedlove 1987); Synonymized to Q. castanea Née following Govaerts and Frodin 1999.
Q. pulchella H.&B. Type (unspecified): P00129710. Synonymized to Q. castanea Née following Govaerts and Frodin 1999.
Q. axillaris E.Fourn. ex Trel. Isotype : P00754064. ** Filed under Q. castanea; indicated to be Q. castanea on det (C.H. Muller 1958; J.R. Bacon 2004).
Q. chihuahensis Trel. 3 isotypes, all duplicates of Pringle 355: P00754065—7. Originally det as Q. undulata Torr. var. breviloba Engelm.
Q. chrysophylla H.&B. 3 isotypes, including 2 duplicates of Bonpland 4062: P00129742—4.
Quercus coccolobifolia Trel. 1 type (unspecified): P00754118. Synonym of Q. jonesii Trel.
Q. praineaya Trel. 2 isotypes, both duplicates of Pringle 8854: P00754068,9. Synonym of Q. coffeicolor Trel.
Q. colombianus Trel. Isotype: P00754070
Q. conglomerata Trel. Isotype : P00754165 (Breedlove 1987). Synonym of Q. rugosa Née (Breedlove 1987). Previously det. as Q. reticulata (C.H. Muller, 1958).
Q. nitida M.Martins & Galeotti. Type (unspecified) : P00754071. Synonym of Q. conspersa Benth. following Govaerts and Frodin 1999. ** filed under Q. conspersa
Quercus convallata Trel. Type (unspecified) : P00754073.
Quercus corrugata var. graminiflora Trel. Isotype : P00754075 (C.H. Muller, 1958).
Quercus corrugata Hooker. Isotype : P00754074 (C.H. muller, s.d.).
Quercus orbiculata Trel. Isotype : P00754077. Synonym : Q. crassifolia Bonpl. ** Filed under Q. crassifolia.
Quercus stipularis H.&B. 3 isotypes : P00129748—50. ** Filed under Q. crassifolia.
Quercus crassifolia H.&B. Type (unspecified): P00129735—7.
Quercus spinulois H.&B. Isotype : P00754078. Synonymized to Q. stipularis H&B (CH Muller, 1958), then to Q. crassifolia H&B (Breedlove, 1987). ** Filed under Q. crassifolia
Quercus confertifolia H.&B. Isotype : P00129739 ; Type (unspecified) : P00129738. Synonym of Q. crassipes Bonpl. ** Filed under Q. crassipes Bonpland.
Quercus crassipes var. angustifolia H.&B. 4 isotypes : P00129730—33 (the last annotated by CH Muller, 1958); 1 type (unspecified): P00129729. Synonym of Q. crassipes Bonpl. ** Filed under Q. crassipes Bonpland.
Quercus splendens var. pallidior A.DC. Isotype: P00754076 (Breedlove 1987). Synonym of Q. crassifolia H&B (Breedlove 1987); synonym of crassifolia. ** Filed under Q. crassipes Bonpland. – I refiled to Q. crassifolia.
Quercus subavenia Trel. Isotype: P00754081 (A. Camus, s.d.). Synonym of Q. depressa H&B.
Quercus depressa H.&B. 2 isotypes (Bonpland 4145): P00129727, 8.
Quercus fournieri Trel. Holotype : P00077457 (Breedlove 1994). Originally det as Q. ferruginea H&B. ** filed under Q. dysophylla Benth. (but not redet. to this by Breedlove). The Plant List treats both as hybrid names, with Q. x fournieri a synonym of Q. x dysophylla.
Quercus sideroxyla H&B f. ciliifera Trel. Syntype : P00754082 (A. Camus). Synonymized to Q. eduardii Trel. (Breedlove, 1987 ; Bacon, 2004).  ** Filed under Q. eduardii
Quercus langlassei Trel. 2 Isotypes : P00754088, 9. Synonym of Q. elliptica Née (Govaerts and Frodin, 1999). ** Filed under Q. elliptica.
Quercus chiquihuitilloi Trel. 2 isotypes : P00754083, 4. Synonym of Q. elliptica Née (Breedlove, 1987).  ** Filed under Q. elliptica.
Quercus oajacana Liebm. isotype : P00754086. Synonym of Q. elliptica Née (Breedlove, 1987).  ** Filed under Q. elliptica.
Quercus linguaefolia Liebm. Isotype : P00754087. Synonym of Q. elliptica Née (Breedlove, 1987).  ** Filed under Q. elliptica.
Quercus pubinervis M.Martins & Galeotti. Isotype : P00754085 (Breedlove, 1987). Synonym of Q. elliptica Née (Breedlove, 1987).  ** Filed under Q. elliptica.
Quercus emoryi Torr. var. chihuahuensis Trel. Isotype: P00754091. Synonym of Q. chihuahuensis Trel. (J.R. Bacon, 2004). At E, this thing seems to be annotated to Q. grisea by Breedlove (1987). ** Filed under Q. emoryi. ** Move to Q. chihuahuensis?

Quercus emoryii Torr. var. san-ysidroana Trel. Isotype: P00754090. Synonym of Q. emoryi Torr. (J.R. Bacon, 2004). ** Filed under Q. emoryi.

Wednesday, February 12, 2014

got it!

I think this is it! Here, I do the standard partioned RAD plot of loci favoring (panel A) vs disfavoring (panel C) loci, then a regression on the main distribution of trees in panel A (the other trees on panel A are not plausible). Overlaying the 95% prediction interval on the main part of the partitioned RAD analysis regression, no points are excluded except tree 122, which does not have anything obviously bad about it.

If there is a disproportionately supported tree (in terms of number of loci supporting it, as a function of the log likelihood), we'd expect it to fall outside the prediction interval on the upper side. We don't see any such.

Okay. This is how it goes into the paper! what I need now is simulated data to see how well this performs, but that will have to wait.

Tuesday, February 11, 2014

Venus, first violet, Ranunculus leaves

This was one of the most beautiful mornings we've had since arriving. I left Cauderan at 7:10, Venus bright in the east, hardly any water on the streets (which is unusual). I grabbed the paper in Pessac, waited 20 minutes for a late train, and by the time we left the sky was glowing orange and yellow. Train to Cestas / Gazinet, and in the inundated ditches along the roadside, enormous buttercup leaves were unfurled, along with patches of light green geranium. The first violet I've seen this year was open but submerged in the ditch. By the time I hit Pierroton, the sun was just coming up over the houses.

RADami ready to use

I redid the tree likelihoods as the likelihood for each tree on the full data matrix, and the plot now is a lot more informative. It turns out that using the summed locus likelihoods was, as I was concerned, giving a falsely linear relationship. The conclusion now is the same -- no obvious secondary tree out there in this dataset -- but with different islands of trees each having a separate distribution of likelihood scores:


Tree likelihood based on summed loci
Including only loci that have a minimum of 20 unique trees, overall likelihood range of 4,
and selecting supporting vs. disfavoring loci based on a 2-lnL point threshold.

Tree likelihood based on full data matrix
Including only loci that have a minimum of 20 unique trees, overall likelihood range of 4,
and selecting supporting vs. disfavoring loci based on a 2-lnL point threshold.

RADami now builds and installs fine on Linux and Windows (8.1; I assume no problems on older windows).

Friday, February 7, 2014

Assemblee Generale, BioGeCo 2014

Today was the annual Assemblée Générale de BioGeCo, held at the University of Bordeaux in Talence. This is the first day I've seen everyone present on their projects. Remy Petit, the BioGeCo leader for the past three years, opened with a statement on the unit as a whole. BioGeCo holds as its goal to “study the mechanisms governing the evolution of the diversity of terrestrial ecosystems, from genes to communities,” and to “rebalance ecologie and genetics in a common evolutionary framework.” There is, of course, a very strong tree focus here, and there is a long history of forestry research here. 

Today, there are 109 people employed at BioGeCo, 68 salaried (43% lead-researchers, 57% engineers and technicians), 21 PhD students, 18 postdocs and contract researchers. From 2009 to 2014, increased by 4 chercheurs and 5 engineers. 

I made just a few notes on projects that were of interest to me. This is neither exhaustive nor representative, only things that caught my interest:

Highlights in GEMfor [Cécile Robin]
·       Sequencing 150 species of pathenogenic fungi
·       Studying the evolution of natural history traits (niche differentiation, virulence) using experimental and modeling approaches, in the light of climate change and colonization of novel environments and regions
·       The effect of parasitic fungi (oïdium) in the evolution of oaks
·       Horizontal and vertical transmission of microorganisms in oaks
·       Role of microorganisms in the health of the plant (e.g., invisibility of the community by pathenogenic species); this, tied in with climate change, by looking at microfungal community changes in response to climatic variation

Highlights in Ecologie des Communautés [Emmanuel Corcket]
·       13 permanent positions (including 9 chercheurs, 3 engineers, 1 technician [?]), 9 contractual
·       Interested in both the distributional (patterns, processes underlying patterns) and functional (e.g., interactions) components of biodiversity, and the interactions between communities and their environments in both directions (e.g., responses to climate change, effects on nutrient and water cycling)
·       Are trees a countryside (‘paysage’) for bacterial communities?
·       Impacts of atmospheric deposits on prairie (‘prairiale’) biodiversity in the Pyrenees.
·       Latitudinal distribution of birds and bats
·       Bird predation and trophic cascades
·       Effects of plant phylogenetic diversity on herbivory depend on herbivor specialization
·       Interaction between host / nonhost abundance and insect invasion in pines
·       Restoration as a tool for environmental remediation and effects on ecosystem processes
·       Risk analysis: movements of forest insect herbivores; effects of forest insect herbivores on plant growth
·       Herbivory-resistance traits
·       Indirect effect of invasive insects on the diversity of insular [endemic?] forest birds on Corsica

Ecologie et Génomique Fonctionnelles (EGF) [Annabel Porté]
·       23 permanents; 93 publications in 2011-2013
·       The interaction between genotype and phenotype that affects adaptation
·       Strategy: produce and integrate understanding on function and structure…
·       Cavitation resistance in pines
o   Genetic architecture (QTL)
o   Physiological
o   Leaf phenology
o   Phylogenetic distribution
·       QTLs underlying budbreak [? – debourrement]
·       Rapid evolution of populations
·       Artificial selection and its effects on genomic architecture of differentiations
·       A genetic map for oaks with higher density of markers
o   Genomic signature of adaptation and speciation in oaks

o   A phylogeny of the genus Quercus

Thursday, February 6, 2014

Fulbright and Paris Herbarium -- post visit

I was unsure what to expect of the Fulbright midyear meeting. The common ground for everyone is being an American in France, which doesn’t seem like all that much to go on. Fortunately, it’s a very interesting group of people, broad-minded and engaged. The main meeting was held in the building where the Marshall Plan was hammered out. The meeting opened with a tour of the building, a tour that closed with a photo of the courtyard in the days after the liberation of Paris, filled with debris from the bombing. Highlights of the meeting for me included conversations with a philosophy of biology student on a causal criterion for natural categories (including, of interest to both of us, the category of species); an electrical engineer working on signal processing, about stochastic computing, the uses of graphs in both signal processing and evolutionary biology, and the rise and fall of Bell Laboratories; a photographer and poet who is working on a book of poems about the first person ever to take a still photo; a chemical engineer, about the importance of learning to be an outsider. The meeting ended with a concert of 20th century French organ music. There was an amazing piece by Messiaen that rattled all of us.

I spent yesterday morning at the Paris herbarium with Béatrice Chassée of the International Oak Society, getting a sense of the oak collection as a whole and working through some of the eastern North American Lobatae. I got through all of what I wanted to today and have a good sense of what material I need to go through when I’m back up here in two weeks. Then Béatrice and I had lunch at the Paris mosque and talked about IOS, Journal of International Oaks, oak collections around the world. Then off to the train, and back home.

Tuesday, February 4, 2014

Fulbright meeting, Paris

 Today and tomorrow are the midyear meeting in Paris for France Fulbright and Chateaubriand Scholars. The train left out of Gare St. Jean (Bordeaux) at 8:23, exactly as the sun was rising. As we went out of town to the north, the sun was lighting up the towers over the bridge beside the CAP Sciences Museum. This side of town I don’t know at all (we live on the west side, in Caudéran, and my commute to work takes me over to Pessac to catch the train, so I have missed north Bordeaux altogether so far). Like the rest of Bordeaux, this side grades into vineyards pretty quickly; there’s a chateau now off to the east, active, another we’ve just passed that appears to be an individual’s home, no longer active in grape-growing or wine-making. Fourteen minutes out and we’ve just passed another village, all I could see of which is the cathedral surrounded by its cemetery. There are horses and little round haystacks in the sodden fields, and the ditches are brilliant with duckweed or something else bright green. The train is full of people going about their business, one knitting, another reading the paper. I asked my son David two nights ago what his favorite part of the trip has been so far, from the minute we left Downers Grove. He said it was the flight over. He didn’t know why. It certainly wasn’t for lack of enjoying the rest of the trip: he’s been happy, even exuberant, over so much on this trip. But I understand. The time between destinations transcends what we do at either end. Neither here nor there, a time when plans are evolving, interruptions are minimal, you are with family or strangers or by yourself, and you have time to think and watch out the window, and nothing else you can do. “Old men ought to be explorers / Here and there does not matter / We must be still and still moving” (T.S. Eliot, East Coker). When you travel, constraints and desires come into a different kind of balance.

Last time we took this trip our trip from Paris to Bordeaux to begin this project. The trees were all full of round masses of bright green stuff, which we took to be some kind of squirrel’s nest. Anyone reading this who knows better will recognize immediately that this was in fact mistletoe. It’s all over here, today perhaps a bit greener than it was at the end of December. The fields are much wetter. We’ve had loads of rain since we’ve been here. I wonder if the horses get colds from standing in it all day?

Back to work. I’m trying to finish up documenting the codebase for RADami today. That has been a ridiculously slow process with little to write about, but I think I can finish it up on the train and the manuscript revised and resubmitted this week, so I can get on with new RAD work next week. One realization: there may be a bias in the partitioned RAD visualization the way I’ve been doing it, because the expectation is linear with a slope of 1.0 when (1) the tree likelihood is calculated as the sum of locus log-likelihoods for all loci based on just the loci used for the partitioned analysis, rather than the global likelihood for all loci; and (2) locus log-likelihoods are assigned based on topological identity with pruned trees that are voted on by each locus. I’m not sure this introduces a bias. It should reduce noise, as there is noise associated with requiring a locus to vote on trees that are topologically identical to one another when pruned down to just those taxa that are in the locus, but not topologically identical when all taxa are included. Because the optimization runs only until it stops improving by epsilon, trees that all lie at one point in the likelihood surface may appear to be at different positions. So in this sense the visualization as written biases us toward finding a tighter fit plot; but does it bias us toward a linear relationship? I’ll set this up to run the global likelihood on each tree as well, and see whether that plot is also so nice.

Thursday, January 23, 2014

The RAD partitioning works and is much cleaner, and tells us something meaningful

I was beginning to think this was a waste of time, but in fact the analysis now is so much cleaner and nicer. The workflow is explained in the last few posts. The way I'm deciding who votes for vs. against a locus now is by a likelihood threshold. In the plots below, I've filtered out all loci that have < 5 log-likelihood point spread, and I only consider trees voted for if they are in the top 2 lnL point range, and voted against if they are in the bottom 2 lnL point range. Narrowing the spread doesn't change the story, but makes it noisier.

The result is striking: there is no tree favored by more loci than the best tree. There should be an outlier test associated with this, but just eyeballing it, there seems no obvious second story lurking among the suboptimal trees that is supported by a lot of loci. What is needed now is some simulation of what the data look like from a clean, single-story dataset. It's hard to imagine it looking much cleaner than what we get here.

Loci that favor the optimal topology and
each of 200 permuted topologies

Loci that disfavor the optimal topology and
each of 200 permuted topologies

Tuesday, January 21, 2014

Updating the partitioned RAD analysis

I ended up rewriting the RAD scripts so they do split out a new dataset and set of trees for each locus... the workflow goes like this:
  1. Generate a set of topologies using NNI. Use: genTrees [fixed today to return unique trees, which it wasn't necessarily doing before]
  2. Export locus and unique set of trees for each locus, and a files with the tree indices for each locus. Use: gen.RAD.loci.datasets 
  3. Make the resulting sh files executable: use chmod u+rwx raxml.batch.* then execute using sh. note that the export is currently written for Linux and defaults to my directory structure
  4. read in the info files to get likelihood for each tree, the index files to get original tree index, and then apply those likelihoods to the trees: use match.lnL.to.trees. I see now that the way the index is written is a bit awkward. What I want from the index is to know which new tree points to which old tree... but what the index actually gives me is, for each old tree, which other old tree that tree is identical to. I guess I can get what I want by just taking unique(treeIndex)... okay, that works.
Okay! that all runs fine... I'm back to the problem of deciding whom to favor and disfavor. I've scrapped all the old SWUL code, which was convoluted and hard to follow and inappropriately named (it's no longer successive weighting, but locus-partitioned data exploration). 

Figuring this part out now...

Plot at the end: how many loci favor each tree, how many disfavor each tree, how many don't vote, and favoring - disfavoring

Monday, January 20, 2014

Frost, thrushes, hazelnuts

This morning was the first frost we've had since our arrival about a month ago. Two days ago, I started noticing thrushes singing: mistle thrush? song thrush? I haven't seen one. There's a big, beautiful hazelnut bush near the entrance here whose catkins have descended. Bright yellow Ulex are in bloom in the sandy fields behind the building here. Days are getting longer and everything is greening up.

Friday, January 17, 2014

Making unique trees -- updated in ape 3.1

Emmanuel Paradis is including an updated version of unique.multiPhylo in ape 3.1, which will go onto CRAN soon. The old trees index in that function is accessed using attr(x, 'old.index') and is ordered the same as in the version I posted earlier this week.

Monday, January 13, 2014

Making unique treesets

unique.multiPhylo doesn't toss out duplicate trees with my dataset. The problem isn't all.identical.phylo: my function compare.all.trees is slow, but it does seem to detect pairwise tree identity properly (there are 16 in the treeset I have) using all.equal.phylo. The issue is with indexing: replacing 'for s in x[keep]' to index the trees with 'for s in 1:length(keep)' and then comparing with x[[s]] solves it.

I also added to the output an attribute that indexes the original trees to the new (unique) trees. The updated function is at 

http://mor-systematics.googlecode.com/svn/trunk/rads/unique.multiPhylo.R

This finishes step 2 of the analysis workflow from last week.

Friday, January 10, 2014

Refining the RAD incongruence method

Building on last night's post, the results of these analyses may be cleaner when you filter for just the loci that are especially conclusive. Compare the two figures below:

Output from plot.swulLikelihood, including all 9,179 loci
Output from plot.swulLikelihood, filtering to the 587 loci with lnL range ≥ 1.5

In the lower panels, only loci were included that exhibit a log-likelihood range of 1.5 or greater. The result is a tighter distribution of points in all three panels, suggesting that we are getting rid of some noise. Better!

Now, what if we just let loci talk about trees they care about? Can we do this with the likelihood calculations the way they were done? The way it's done now, each locus is forced to vote on every tree, unpruned. I think in the best implementation, I would take the 201 trees and, for each unique taxon set (not each locus, because many loci will have identical sets of taxa), spit out a treeset pruned to just that taxon set for each taxon set, then calculate the site likelihoods based on those treesets. Then every locus would be evaluating every tree, but many trees would be topologically identical because of the pruning... and which trees were topologically identical would differ among loci. Does this make a problem for interpretation? We are still investigating the balance of loci—i.e., for each tree, we are asking how many loci rank the tree in the top x% and lower x% of trees, using lnL—but we are blurring (at best) the search for cliques of loci that favor alternative topologies, because the loci are actually voting on different topologies. Maybe this method really makes the most sense to apply node-by-node, and ask then how the loci that have anything to say about a given node are voting.

But to the analysis at hand, which is really focused on identifying alternative topologies supported by different suites of loci. Perhaps what is best to do is to:
  1. Prune the trees for each locus. Output: a treeset for each locus that has identical tips in all trees and in the locus
  2. Identify the unique groups of topologies post-pruning. Output: a vector for each locus indexing all trees by the unique topology to which they belong; and a vector of unique topologies (number is fine, as the topologies can be looked up from the treeset and its corresponding vector locus)
  3. Average the likelihood across loci for all unique topologies. Output: a vector of locus-likelihoods that corresponds with the length of the unique topologies vector.
  4. Give a "favored" or "disfavored" assignment to the unique topologies for each locus. Here is where a problem may come in for thinking about these trees as a whole, rather than working node-by-node: what about when a locus is only meaningful for a handful of topologies? Ties among topologies will not have an effect on the ranking. So if there are 3 topologies, we have a 33rd percentile and a 67th percentile. If we are doing a 50-50 analysis, we say the top topology is supported at the 50th percentile, the bottom topology is rejected at the 50th percentile, and the middle is both supported and rejected. Every topology will be counted at least once by every locus... but hold it! This will increase the weight of loci that have the poorest taxon sampling, because they will impact the most trees. Perhaps there is a good solution; do the 33rd / 67th percentiles to avoid just this problem, and toss out loci that only have two topologies. Output: a vector of 1, -1, or 0 for unique topologies for each locus indicating whether they are favored, disfavored, or neutral.
  5. Assign the rankings to all topologies. Output: a vector of 1, -1, or 0 for every topology, using the index created in step 2.
Okay. I think this is a lot cleaner. And in fact, this could be done just fine on a set of topologies uniquely created for each locus, and that would be cleaner still, but it would be more time spent chopping up datasets and analyzing trees on each dataset, probably for no or negligible difference in the result. For now, I'll work with the likelihoods we got by analyzing the full trees.

Time to rewrite.