Molecular marker for linkage analysis in forest trees: literature review
(Christophe Plomion)
Linkage maps have been constructed in trees and specially in conifers using several different markers such as allozymes, proteins, Restriction Fragment Length Polymorphisms (RFLPs) and Randomly Amplified Polymorphic DNAs (RAPDs).
The first linkage studies were based on segregation of allozymes in the megagametophytes seed tissues of conifers. More than 10 species were studied for about 15 loci (Guries et al., 1978; Rudin and Eckberg, 1978; O'Malley et al., 1979; Ekert et al., 1981; Cheliak et al., 1984; O'Malley et al., 1986; Furmier et al., 1986; Strauss and Conkle, 1986; El-Kassaby et al., 1987; Shiraishi, 1988; Szmidt and Muona, 1989). Conkle (1981) and Niebling et al. (1987) described more loci, but because of the paucity of isozyme loci and their associated polymorphisms, the linkage analysis did not allow to cover the whole genome. This limitation lead several groups to use other types of molecular markers.
Two-dimensional electrophoresis (2D-PAGE) of total proteins of megagametophytes allowed to study of a much larger number of loci than has been possible with isozyme analysis. Bahrman and Damerval (1989) reported linkage analysis for 119 loci and Gerber et al. (1993) reported a 65 loci linkage map covering 530cM of the maritime pine genome. If the number of loci is higher than with isozymes, it is still not unlimited, making proteins not very powerful for applications that require a broader genome coverage (e.g. QTL analysis). Besides, the method is expensive and the interpretation of the gels required a tremendous amount of experience, since all the markers remain on the same gel.
The development of RFLP for high density genomic mapping (Botstein et al., 1980) provided a new technique which overcame some of the problems associated with isozymes and proteins. Extended linkage maps based on RFLP markers have been constructed for a great number of agronomic crops and annual plant species. In trees, few study were reported. Devey et al. (1994) presented linkage groups, in loblolly pine for 80 RFLPs detected using cDNA probes. RFLP methods are well suited for species maps because the same hybridization probes can be used in comparisons among species. Ahuja et al. (1994) showed that mapped DNA probes from loblolly pine could be used to construct RFLP maps for related species. RFLP based genomic map using loblolly pine cDNA probe is being constructed in douglas-Fir (Jermstad et al., 1994). A RFLP-map of radiata pine has also been constructed with cDNA probes from loblolly pine and radiata pine (Devey et al. 1996). Linkage maps using mostly RFLP markers have been recently presented for poplar (Bradshaw et al., 1994) and Eucalyptus (Byrne et al., 1995). Although RFLPs are unlimited, they require elaborate laboratory techniques (development of specific probe libraries, use of radioisotopes, southern blot hybridization procedures and autoradiographie) which make them labor intensive, time consumming and costly (Kesseli et al., 1994). Neale et al. (1989) estimated the task of screening RFLPs in parents and progeny of one cross would take approximately 3 years. In addition, in some tree species such as pine, DNA content is so high (Wakamiya et al., 1993) that single-copy Southern hybridization may be impractical, or at least very lenghty exposures are required.
During the past four years, the development of a PCR-based arbitrarily primed genetic assay called RAPD (Random Amplified Polymorphic DNA, Williams et al., 1990), AP-PCR (Arbitrarily Primed PCR, Caetano-Anolles et al., 1991) or DAF (DNA Amplification Fingerprinting, Welsh and Mc Clelland, 1990) has greatly changed the prospects for application of molecular markers to study populations and to accelerate breeding (Rafalski et al., 1991; Rafalski and Tingey, 1993). In particular, RAPD markers provide a very powerful tool to generate relatively dense linkage maps in a short period of time. The RAPD technique use arbitrarily 10-base oligonucleotides as primer and the polymerase chain reaction (PCR) to amplify specific DNA fragments. Polymorphisms detected between individuals presumably result from numerous changes including sequence differences in one or both of the primer binding sites, insertion/deletion events or rearrangement in priming sites or in internal amplified sequence. They are visible as presence or absence of a particular amplified product from a single locus (Welsh et al., 1992; Williams et al., 1990). This means the arbitrarily primed PCR are usually dominant because individuals containing two copies of an allele (homozygous with presence phenotype) cannot be distinguished from individuals with one copy of the allele (heterozygous with presence phenotype). This dominance mode of inheritance is not an issue for genetic mapping with the haploid megagametophyte of gymnosperms or by screening RAPD primers for informative markers which segregate 1:1 in diploid tissue of angiosperms and gymnosperms (Carlson et al., 1991). Using this latter technique called " pseudo-testcross mapping strategy", Grattapaglia and Sederoff (1994) produced simultaneously two RAPD maps of approximately 250 loci each in an interspecific cross of Eucalyptus. The "pseudo-testcross mapping strategy" is based on selection of single-dose polymorphic markers heterozygote in one parent and homozygote null in the other parent and therefore segregating 1:1 in their progeny as in a testcross (Carlson et al., 1991). The half-sib mapping strategy using megagametophytes from seed parent ("half sif mapping strategy") has been used for construction of genetic maps in loblolly pine (Grattapaglia et al., 1992a), white spruce (Tulsieram et al., 1992), slash pine (Nelson et al., 1993), longleaf pine (Nelson et al., 1994), norway spruce (Binelli and Bucci, 1994), douglas-fir (Broome and Carlson, 1994), maritime pine (Plomion et al., 1995a, 1995b) and scots pine (Yazdani et al., 1995). The efficiency of these strategies depends on finding individual trees that are heterozygous for many loci.
RAPD markers have also been used to rapidely identify markers linked to genes or genomic regions of interest by bulked segregant analysis (Michelmore et al., 1991). Using this method, Devey et al. (1995) find 10 RAPD markers flanking the resistance gene to white pine blister rust in Sugar pine, six of them being within 5 cM of the gene. In Loblolly pine a dominant gene that confers resistance to fusiform rust disease was identified by genomic mapping (Wilcox et al., 1995). In Norway spruce, Lehner et al. (1995) identify a RAPD marker closely linked to the pendula gene.
In Eucalyptus, RAPD markers have been used in genetic analysis of individuals and populations including clone fingerprinting, outcrossing rate estimation and phylogenic relationship studies (Grattapaglia et al., 1992b). In oak, the molecular differentiation between Q. petraea and Q. robur was evaluated with RAPD markers (Moreau et al., 1994).
The advantage of the arbitrarily primed PCR such as RAPDs are, the requirement for small amount of DNA (5-20ng), the rapidity to screen for polymorphisms, the efficiency to generate a large number of markers for genomic mapping and the potential automation of the technique (Neale and Sederoff, 1991; Nelson et al., 1992; Sobral and Honeycutt, 1993). In addition, no prior knowledge of sequence are required. Since primers can be chosen arbitrarily, any organism can be mapped with the same set of primers. These advantages make RAPD markers far easier to work with than RFLP's and thus very attractive for breeding application (Rafalski et al., 1991). Therefore, one large impact of the RAPD technique has been to increase the species amenable to mapping activities. It is particularly true in forest trees. Although the repeatability of RAPD markers is high within laboratories using one type of thermocycler and one amplification condition, correspondance of markers resolved among labs has been a problem.
In order to find markers that combine the advantages of both RAPDs (i.e. markers PCR based, no probe maintenance or distribution) and RFLPs (i.e. co-dominance mode of inheritance), and that could potentially be used across families, Sequence Tagged Sites (STS) markers have recently been developped in crop plants (Williams et al., 1991 ; Tragoonrung et al., 1992; Konieczny and Ausubel, 1993) and more recently in trees. A STS (Olson et al., 1989) is a unique, simple-copy segment of the genome whose DNA sequence is known and which can be amplified by specific PCR. When STS loci contain DNA length polymorphisms (e.g. simple sequence length polymorphisms, SSLPs), they become valuable genetic markers. The main advantage of STS loci lies in the speed with which they can be analyzed once PCR primer pairs have been identified. Like RFLP loci, STS loci can be analyzed as co-dominance genetic markers and can in theory, be studied in member of the species or closely related species, provided that the DNA sequence is conserved at the PCR primer sites. Analysis with STS markers thus combines the speed of the RAPD markers with the informativeness of RFLP markers. Three types of STS have been reported in trees. One type contains Simple Sequence Repeats (SSRs) also known as microsatellites sequences, which are consisting of tandemly repeated multicopies of mono-di-tri and tetra-nucleotides motif. Microsatellites are reported to be ubiquitous, abundant and highly polymorphic markers. They have been well characterized in mammalian genomes and in a small number of plant genomes (Akkaya et al., 1992; Morgante and Olivieri, 1993; Zhao and Kochert, 1993). Their characteristics, including their ability to be rapidely typed using PCR techniques, makes SSRs an attractive option for mapping and fingerprinting in trees. Besides, they are potentially multiallelic, thereby having the potential for more information per marker. They have been identified in radiata pine (Smith and Devey, 1994), loblolly pine (Nelson et al., 1996), european larch (Hutchison, 1994), slash pine (Kamm et al., 1994), many other conifer species (Marquardt and Echt, 1995), white birch (Mikko et al., 1996), and oak (Steinkellner et al., 1995; Takayuki et al., 1996). The other type of STS marker developped in trees is random amplified polymorphism DNAs (RAPDs) that has been sequenced, allowing PCR primer to be made for the ends of the RAPD fragment. These STS-converted RAPD markers are sometimes referred to as SCARs (Paran and Michelmore, 1993) for Sequenced Characterized Amplified Regions. While SCARs will allow for rapid STS marker development, they may not prove to be highly polymorphic (Bodénès et al., 1996). SSR markers, on the other hand, are expected to be highly variable across most populations. However, early results suggested that microsatellite variation could be rare in pine (Hutchison et al., 1994). If microsatellites are generated by a mechanism that involves recombination (e.g., unequal crossing over), the low recombination rate per nucleotide might, at least in part, explain their scarcity in pine. STS have also been developped from cDNA (ETS: Expressed Tagged Sites) and DNA clones in poplar and in pine (Bradshaw et al., 1994 ; Harry and Neale, 1994) but in nearly all cases, polymorphisms were apparent only after digesting the PCR fragments with restriction enzymes, which makes the method for identifying segregating polymorphisms time consuming and costly.