Haptoglobin (HP) and Haptoglobin-related protein (HPR) copy number variation, natural selection, and trypanosomiasis
Haptoglobin, coded by the HP gene, is a plasma protein that acts as a scavenger for free heme, and haptoglobin-related protein (coded by the HPR gene) forms part of the trypanolytic factor TLF-1, together with apolipoprotein L1 (ApoL1). We analyse the polymorphic small intragenic duplication of the HP gene, with alleles Hp1 and Hp2, in 52 populations, and find no evidence for natural selection either from extended haplotype analysis or from correlation with pathogen richness matrices. Using fiber-FISH, the paralog ratio test, and array-CGH data, we also confirm that the HPR gene is copy number variable, with duplication of the whole HPR gene at polymorphic frequencies in west and central Africa, up to an allele frequency of 15 %. The geographical distribution of the HPR duplication allele overlaps the region where the pathogen causing chronic human African trypanosomiasis, Trypanosoma brucei gambiense, is endemic. The HPR duplication has occurred on one SNP haplotype, but there is no strong evidence of extended homozygosity, a characteristic of recent natural selection. The HPR duplication shows a slight, non-significant undertransmission to human African trypanosomiasis-affected children of unaffected parents in the Democratic Republic of Congo. However, taken together with alleles of APOL1, there is an overall significant undertransmission of putative protective alleles to human African trypanosomiasis-affected children.
Haptoglobin (Hp), encoded by the gene HP, is an abundant acute-phase glycoprotein in the plasma which binds free haemoglobin (Hb) that has been released by lysis of erythrocytes, often as a result of infection. The resulting haptoglobin-haemoglobin complex is cleared by binding to the macrophage scavenging receptor CD163, followed by endocytosis. This process prevents oxidative damage and disruption to nitrous oxide homeostasis caused by free heme molecules (Nielsen and Moestrup 2009).
Because of its abundance in blood plasma, Hp was one of the first blood serum proteins to be analysed by native protein electrophoresis to identify polymorphic variation (Smithies 1955). Two electrophoretic alleles, termed Hp1 and Hp2, were subsequently characterised as resulting from a 1.7 kb intragenic duplication so that the Hp2 allele encodes a longer peptide chain than the Hp1 allele (Maeda et al. 1984; Smithies et al. 1962). The two alleles encode proteins that are functionally different, and have been associated with a variety of clinical conditions (Langlois and Delanghe 1996). There is evidence that homozygotes for the Hp2 allele are more protected against severe malaria (Atkinson et al. 2007; Quaye et al. 2000), although such a link remains controversial (Aucan et al. 2002; Bienzle et al. 2005).
Like Hp, Hpr binds free heme with high-affinity, but the resulting Hpr-Hb complex does not bind to CD163; instead it persists in the serum bound to apolipoprotein L1 (ApoL1) (Nielsen et al. 2006). Hpr protein has an important role in protection against Trypanosoma brucei, the pathogen that causes human African trypanosomiasis, also known as sleeping sickness (Barrett et al. 2003; Smith et al. 1995). Trypanosomes rely on binding and internalisation of circulating plasma haptoglobin-haemoglobin (Hpr-Hb) to acquire iron necessary for their survival. Together with ApoL1, Hpr-Hb forms a protein complex called the trypanosome lytic factor-1 (TLF-1), which uses the trypanosome’s receptor for haptoglobin to deliver ApoL1 into the lysosomal compartment of the trypanosome, where the low pH triggers lysis (Drain et al. 2001; Vanhollebeke et al. 2008). This Trojan horse approach mediates effective killing of the trypanosome. TLF-1 causes effective lysis of T. brucei brucei, a zoonosis from cattle which infects humans but is self-resolving. However, T. brucei rhodesiense, which causes acute human African trypanosomiasis (HAT) in East Africa, is protected against TLF-1 by the parasite’s SRA gene. In addition, TLF-1 does not appear to be effective in vitro against T. brucei gambiense, which is currently endemic to West and Central Africa, causes chronic HAT, and is responsible for most deaths from this disease. This is due, at least in part, to coding sequence changes in T. brucei gambiense that reduce the affinity of the receptor for TLF-1 (Kieft et al. 2010). Hpr is also a component of trypanosome lytic factor 2 (TLF-2), but this is less stable, less-studied and appears to contain many other components (Raper et al. 1999).
The evidence for interaction of HP and HPR genes with different pathogens prompted us to explore the Hp1/Hp2 polymorphism and the CNV of the HPR gene in populations from around the world, investigate the role of selection on alleles of these polymorphisms, and test the role of increased HPR copy number in susceptibility to HAT.
952 DNA samples from 52 populations were obtained from the CEPH-Human Genome Diversity Project (HGDP) (Cann et al. 2002; Rosenberg 2006). DNA samples used in the HapMap Phase 1 project (CEU, European Americans from Utah; YRI, Yoruba from Ibadan, Nigeria; CHB, Chinese from Beijing; JPT, Japanese from Tokyo) were obtained from Coriell Cell Repositories.
The Yansi samples from the Democratic Republic of Congo (DRC) and HAT phenotyping have been fully described previously (Courtin et al. 2007). Positive cases were identified if both serology (card agglutination test) and parasitology (direct microscopic examination of blood or lymph for parasites) were positive. DNA from 353 individuals was collected, comprising of 135 cases and 218 related controls, consisting of 109 pedigrees. All individuals were born in the area and exposed to the risk of infection since birth. The study was approved both by the ethics committee of the DRC Public Health Ministry and local traditional authorities.
Genotyping the Hp1/2 polymorphism was performed using a PCR approach, developed previously (Koch et al. 2002). Briefly, the assay consists of two separate PCR reactions that generate PCR products of characteristic size dependent of the genotype, which then can be separated by agarose gel electrophoresis and visualised by ethidium bromide staining (supplementary figure 1). Primers A and B (supplementary table 1) amplify a 1,757 bp region from the Hp1 allele and a 3481 bp region from the Hp2 allele. To control for the possibility of the longer product being absent because of highly sheared genomic DNA, primers C and D amplify the junction fragment specific to the Hp2 duplicated allele, generating a 349 bp product. Seven control DNAs (supplementary table 2) with different genotypes were included with every experiment.
Copy number typing using the paralog ratio test
Two paralog ratio tests (PRTs) (Armour et al. 2007) were designed to measure HPR copy number, by identifying paralogous segments of the haptoglobin region using the BLAST-like Alignment Tool (Fig. 1). PRT1 assumes HP itself is not copy number variable. Deletion of HP has been observed as a cause of anhaptoglobinemia in Asians, with a frequency of <3 % (Koda et al. 1998). However, we did not see any evidence of this allele that we predict would generate a clear discrepancy between results from PRT1 and PRT2. The second PRT, using primers HP_PRT_1F and HP_PRT_1R (supplementary table 2), amplifies HPR and not HP, because it is targeted to the LTR insertion in the HPR intron, and co-amplifies several reference regions on other chromosomes, providing a second independent measure of HPR copy number. Both PRTs were performed together as a duplex PCR, in 1× Kapa PCR Buffer A (1.5 mM final Mg2+ concentration), 0.5u Taq DNA polymerase (Kapa Biosystems), 3 pmol of each primer and 5–10 ng genomic DNA in a final volume of 10 μl. PCR cycling conditions were 98 °C for 2 min, followed by 23 cycles of 98 °C for 20 s, 57 °C for 30 s and 70 °C for 1 min, followed by a final extension step of 70 °C for 10 min. 2 μl of the PCR product was added to 10 μl formamide with MapMarker400 ROX-labelled size standard (Eurogentec), denatured at 96 °C for 3 min, and then electrophoresed on an ABI3130XL capillary electrophoresis machine.
Quantification of peaks of the electropherogram was performed using GeneMapper (Applied Biosystems), with samples rerun if peak signal was saturated or very weak. Copy number of HPR was estimated by firstly calculating the ratio of test: reference peak area for both PRTs, and correcting for inter-experimental variation by calibrating the ratio against the ratios of seven known copy number controls, included in each experiment (supplementary table 2). The distribution of the average corrected ratios of the two PRT values for each sample, including all controls and replicates, was fitted to a Gaussian mixture model using the CNVtools package (Barnes et al. 2008), implemented in the statistical language R. Following the removal of three samples (HGDP640, NA18503_13, NA19221_13) as outliers, three Gaussian curves were fitted, constraining the means to be proportional to copy number and the variance of each distribution to be the same, reflecting distributions of samples for HPR copy numbers of 2, 3 and 4. These Gaussian curves were used to generate an integer copy number call of 2, 3 or 4 for each sample, together with a posterior probability for each call.
The three variant sites analysed were two SNPs (rs73885319 and rs60910145) and one 6 bp-indel (rs71785313). These were amplified together in a single PCR product, using standard PCR conditions and primers APOL1F and APOL1R (supplementary table 1). The alleles at rs73885319 were distinguished by HindIII restriction enzyme (A-cut, G-uncut), alleles at rs60910145 distinguished by NlaIII restriction enzyme (G-cut, T-uncut) and alleles at rs71785313 by the 6 bp size difference following capillary electrophoresis on an ABI3130xl.
Fiber Fluorescent in situ Hybridisation (Fiber-FISH)
Fiber-FISH was performed as described previously (Perry et al. 2008). Briefly, stretched DNA fibers were prepared from lymphoblastoid cell lines. A fosmid clone (G248P85613E6) that contains the HPR gene and a reference clone (G248P84443C9) was obtained from the clone archive resource of the Wellcome Trust Sanger Institute. Fosmid DNA was prepared using the Phase-Prep BAC DNA kit (Sigma-Aldrich) following the manufacturer’s protocol. The HPR clone was labelled with Dinitrophenol (DNP)-11-dUTP (PerkinElmer) and detected with rabbit anti-DNP and Alexa 488 conjugated goat anti-rabbit IgG. The reference clone was labelled with Digoxigenin (DIG)-11-dUTP (Roche) and detected with monoclonal mouse anti-DIG IgG (Sigma-Aldrich) and Texas red conjugated donkey anti-mouse IgG (Invitrogen). After detection, slides were mounted with SlowFade Gold® (Invitrogen) mounting solution containing 4′,6-diamidino-2-phenylindole (Invitrogen). Images were captured on a Zeiss Axioplan fluorescent microscope and processed with the SmartCapture software (Digital Scientific UK).
Population genetic analyses
FST calculations were performed using Arlequin 3.5 or the R package HIERFSTAT (Excoffier and Lischer 2010; Goudet 2004). For each pair of populations, the percentile rank of FST for the HPR duplication was obtained by comparison with the distribution of FST values calculated for SNPs genotyped in the HGDP panel and showing a similar minor allele frequency (MAF) as the HPR allele. Specifically, for each pairwise comparison the mean MAF of the HPR duplication in the two populations was calculated and HGDP SNPs in a MAF range of ±0.02 were used to obtain the distribution of FST values.
Pathogen absence/presence matrices were constructed for the 21 countries where the HGDP populations are located, based on the Gideon database, as described previously (Fumagalli et al. 2009). Briefly, pathogen diversity was calculated from these data for each population, taking into account only species/genera that are transmitted in the 21 countries, meaning that cases of transmission caused by tourism and immigration were not taken into account; also, species that have recently been eradicated as a result, for example, of vaccination campaigns, were recorded as present in the matrix. Malaria prevalence was obtained from either the Gideon or WHO databases, as previously described (Pozzoli et al. 2010). To account for the demographic history of human populations, correlations were calculated using partial Mantel tests. Specifically, matrices were computed as pairwise Euclidean distances in allele frequency, distance from East Africa, and pathogen diversity or malaria prevalence (either from the WHO or Gideon). Distances from Africa were derived from a previous work (Handley et al. 2007) and refer to a model of human migration from East Africa along landmasses and avoiding mountain regions with altitude over 2,000 m. The statistical significance of correlation tests was calculated by performing 10,000 permutations of pathogen diversity or malaria prevalence within continental regions; these were defined as previously suggested (Li et al. 2008) (i.e. Africa, Europe, America, Central-South Asia, East Asia, Oceania) with Middle Eastern populations grouped with Europeans. Partial Mantel correlations were performed using the Vegan R package.
Haplotype phasing was performed using the Bayesian method implemented in PHASE 2.1 (Stephens and Donnelly 2003). For short-range haplotype analysis, SNP genotypes of HGDP and HapMap samples for 8 SNPs flanking the HP/HPR CNV region were downloaded using the SPSmart portal (Jorge et al. 2008). These 8 SNPs spanned 55 kb immediately flanking the HP/HPR CNV region, and were selected on the basis of genotypes being available on the HGDP panel, and not being within the copy number variable region itself. The Hp1/2 polymorphism and the HPR duplication polymorphism were coded as diallelic SNPs for phasing. For long-range haplotype phasing, SNP genotypes from 2 Mb surrounding the HP gene for the YRI population were downloaded from the International HapMap Project (release 23a) and from the CEPH-HGDP website. The HapMap data consisted of 2218 genotypes (~1 SNP per kb) and HGDP data consisted of 394 genotypes (~1 SNP per 5 kb) from a custom Affymetrix SNP chip (Genome-wide Human Origins 1) courtesy of David Reich and colleagues. The design of this SNP chip was informed by low-coverage resequencing of 12 CEPH-HGDP samples and the low-coverage sequencing of the archaic hominids Neanderthal and Denisovan, and therefore the SNPs represented on the chip are likely to be more representative of common global genetic diversity. For the YRI, SNP genotypes with non-Mendelian inheritance were removed, and, for phasing using PHASE, all data were prepared using the software PLATO (http://ritchielab.psu.edu/ritchielab/project-plato/).
Extended haplotype analysis
We used the R package REHH for all extended haplotype analyses and plots (Gautier and Vitalis 2012). SNP physical map positions were converted to genetic map positions based on the Rutgers second-generation linkage map (Matise et al. 2007). Extended haplotype homozygosity (EHH, Sabeti et al. 2002), was calculated for both Hp1/2 and HPR duplication polymorphisms, for all SNPs until EHH <0.05. The integrated haplotype score (iHS) was calculated on all SNPs, with an allele frequency bin of 0.2 to standardise iHS scores against other SNPs of its frequency class within the region. P values were calculated assuming a Gaussian distribution of iHS scores under the neutral model, this assumption was checked by plotting the values against a Gaussian distribution. Age of the HPR duplication was estimated from linkage disequilibrium using the equation EHH ≈ Pr(Homozygosity) = e−2rg, where r is recombination rate in Morgans and g is the age in generations (Voight et al. 2006). Rearranging to give −ln(EHH) ≈ 2rg, we estimated the age of the allele by regressing the values of −ln(EHH) at various genetic distances 2r from the HPR allele, the gradient of the regression line being equal to g. Estimates of age in years were converted by multiplying the allele age by the generation time, estimated to be 27 years (Fenner 2005).
Family-based association tests
Family-based association tests were performed for the HPR duplication and three SNPs in the APOL1 gene using FBAT v2.0.4 software (De et al. 2013; Horvath et al. 2001). Single variant tests were performed under an additive model, and the empirical variance (the −e option) was used to ensure its validity as a test of association in the pedigree. Each variant was analysed in turn and together using a collapsing method, originally designed for rare variant analysis. The unweighted statistic was calculated (using option −v0), because of the similarity of minor allele frequency of each polymorphism, and the vulnerability of low minor allele frequencies to stochastic sampling variation in a small dataset.
Accuracy of HPR copy number calling
Precise and accurate calling of copy number presents technical challenges. Where the copy number variable region is small and the structure well defined, PCR across the whole region followed by separation by size, and junction fragment PCR, are robust strategies that we use here to genotype the 1.7 kb duplication responsible for the Hp1/2 polymorphism. However, for larger CNV regions, often with unclear structures, quantification of DNA sequence by hybridisation or quantitative PCR strategies are often used, but such methods are prone to noise and need to be well-validated.
Allele frequency in different populations
For HPR copy number, we took copy number of 3 as heterozygous duplication, and copy number of four as homozygous HPR duplication. The genotype frequencies of the HPR duplication allele in the different populations were all in Hardy–Weinberg equilibrium, and the deduced allele frequencies in the HGDP panel populations are shown in supplementary table 3 and Fig. 4b. The HPR duplication allele is restricted to Africa, except for two heterozygotes, one Druze and one Palestinian. We found no instances of the duplication in the CEU, JPT and CHB HapMap phase 1 panels, consistent with aCGH data.
As an alternative analysis, we calculated the pairwise FST value between each population for both polymorphisms. For the HPR CNV, this is not very informative because the duplication allele is only present in African populations, yet the pairwise FST values for the Mozabite population in particular are unusually high (Fig. 4d; supplementary figure 3a). For the Hp1/2 polymorphism, we can see high FST values for pairwise comparisons involving the Pima and Papuan populations, reflecting a relatively high frequency of Hp1 in those populations (supplementary figure 3b). Because the Hp1/2 polymorphism was one of the first protein polymorphisms identified, there is a considerable amount of population allele frequency data published that has recently been summarised in a review (Carter and Worwood 2007). We took allele frequency data from this review to extend our FST analysis to a total of 122 populations (supplementary Figure 3c). This analysis suggests that the high FST value of the Pima and Papuans is shared with other Native American and Oceanian populations, and forms the only noticeable difference between the population groups.
Analysis of haplotype context
Haplotypes occurring at ≥1 %
Extended haplotype statistics for 2 Mb surrounding the HP/HPR region
iHS (p) Hp1/Hp2
iHS (p) HPR duplication
Strongest iHS signal in region
The breakdown in LD by recombination of a haplotype can be used to estimate the age of an allele on that haplotype independent of frequency. Using this approach, on the combined West and central African data, we estimate the age of the HPR duplication to be between 3,400 and 4,200 years old, which is consistent with the adoption of agriculture in West Africa.
Analysis of pathogen diversity
Figure 4 shows that the HPR duplication is in populations that are likely to be exposed, or have been exposed, to T. brucei gambiense. Unfortunately, given the small number of analysed populations that have the HPR duplication allele and the greatly fluctuating estimates of T. brucei gambiense sleeping sickness incidence across the region, a formal correlation analysis with pathogen diversity is likely to yield spurious results, if any. However, it has been previously suggested that malaria prevalence might be responsible for the global variation in Hp1/Hp2 allele frequency. To assess any possible effect of natural selection by pathogen pressure on allele frequency, we correlated the allele frequency of Hp1/2 to a number of pathogen diversity indices, as described previously. The non-parametric partial Mantel test is used, which corrects for the distance from Africa which is the main explanatory variable for allele frequency clines in humans, due to the range expansion out-of-Africa. We found no significant correlation of the Hp2 allele with any pathogen diversity index, including malaria prevalence (data not shown).
Family-based study of trypanosomiasis and genes encoding TLF-1 components
The lack of power of correlating allele frequencies with pathogen diversity indices above within Africa led us to directly test the hypothesis that the HPR duplication allele mediated HAT resistance, presumably through a gene dosage effect, and therefore resistance to HAT might be a possible selective agent acting on the HPR duplication allele. We genotyped 135 cases and 218 related individuals for the HPR duplication and for two SNPs and an indel in the APOL1 gene. These three polymorphisms have previously been shown to have undergone natural selection in Yoruba and encode protein variants which show increased ability to lyse trypanosomes (Genovese et al. 2010). Although a relatively small cohort, the family-based approach controls for population stratification, and the sampled individuals are from the Bandundu province of the Democratic Republic of Congo, which has a high prevalence of trypanosomiasis, around 15 %, rising to 70 % in some villages (Ekwanzala et al. 1996). The allele frequency of the HPR duplication in unrelated individuals was 0.101, consistent with its distribution in West and Central Africa.
FBAT analysis under an additive model for associations between HPR and APOL1 polymorphisms and HAT
Number of informative families
Z value (negative sign indicates undertransmission to HAT cases)
P value (1-tailed)
HPR copy number
G (342 Glycine)
G (348 Methionine)
All protective alleles
Combined, without rs73885319
All protective alleles
Combined, without rs60910145
All protective alleles
Combined, without HPR
All protective alleles
In this study we characterise the HPR duplication, which has been observed previously only in African-Americans, and, based on its allele frequency distribution, confirm its likely origin in West Africa. The original report describing the HPR duplication also described individuals with higher copy number, up to 6 copies of HPR on a single chromosome, characterised by Southern blot. We found no evidence of higher HPR copy numbers beyond a simple duplication, so we consider that these high-copy number HPR chromosomes are very rare in the population. It should be noted that the original study selected some individuals on the basis of unusual haemoglobin phenotype, so, given that Hpr binds haemoglobin, it is possible that this enriched for unusual HPR genotypes. We show that the HPR duplication is on one haplotype and is therefore likely to have occurred once, and we confirm the original study that the HPR duplication occurred on an Hp2 allelic background (Maeda et al. 1986).
The distribution of HPR focusing on Central and West Africa supports a hypothesis where increased levels of HPR (a component of TLF-1), and hence higher HPR copy number alleles, are selected for because of improved resistance to HAT. We examined the surrounding genomic region for signatures of natural selection based on extended haplotype tests. These detect recent hard selective sweeps, and there is previously published evidence suggesting that such a selective sweep acted on alleles at the APOL1 gene that show resistance against T. brucei (Genovese et al. 2010; Ko et al. 2013). The evidence for a similar sweep acting on the HPR duplication is equivocal, with the observed extended haplotype better explained by stronger selective sweeps acting on different SNP alleles within the 2 Mb region analysed. By itself, there is no evidence that the HPR duplication is associated with protection against human African trypanosomiasis in an area of Central Africa where the T. b. gambiense parasite is highly endemic, and causes repeated epidemics of sleeping sickness. However, taken together with alleles at the APOL1 gene, the data are consistent with a role of the HPR locus in increased resistance to T. b. gambiense sleeping sickness as a possible selective agent by increasing the effectiveness of TLF-1. The association should be treated with caution, as the study is rather underpowered to detect small effects of variants with allele frequencies <0.2, and ideally should be confirmed in a larger cohort, if such a cohort was available. We also do not test for a gene dosage effect of the HPR duplication, but this is not straightforward given the rarity of the duplication allele and the similarity at the protein level between haptoglobin-related protein and haptoglobin, the latter also being present at much higher levels than the former in serum from healthy individuals (Muranjan et al. 1998). Taken together, the data described in this study suggest that, in vivo, the HAT-protective allelic variants of APOL1 and HPR help the host to overcome the reduced affinity of the haptoglobin receptor for TLF-1 that characterises T. b. gambiense, and this should be tested experimentally. The HP/HPR region varies in copy number in rhesus macaques (Perry et al. 2008), and given that trypanosomes naturally occur in macaques, this might be an alternative model system for further analysis.
The observation of HPR duplication alleles at significant frequency in the Mozabite (Berber) population of Algeria is perhaps surprising, as they are a non-sub-Saharan population, often grouped with Middle Eastern populations, and Algeria is not a country with endemic Trypanosoma. However, pollen analysis shows that North Africa was more lush 6,000 years ago compared to the arid conditions seen today, and therefore may have been within the range of the tsetse fly, the vector of T. brucei (Jolly et al. 2008; Steverding 2008). Presence of trypanosomiasis, at least in animals, was recorded by the ancient Egyptians 4,000 years ago in an area now free of the disease and the vector, so it is possible that the observation of the HPR duplication in North Africa and the Middle East is a result of selective events in the past when trypanosomiasis may have been endemic. Alternatively, it is known that the Mozabite were nomadic, and roamed as far south as the Niger and Senegal rivers in West Africa, so they may have inherited the HPR duplication from populations to the south in tropical endemic areas. The dating of the HPR duplication allele between 3,400 and 4,200 years ago suggests that it originated soon after the development of agriculture in West Africa, possibly after the drying of the Sahara region and the consequent southward move of the northern limit of the tsetse fly, the vector for trypanosomiasis.
It has previously been suggested that the distribution of the Hp2 allele has been driven by malaria selection pressure. Our data do not support this, because we did not find any correlation between the Hp2 allele and a number of pathogen diversity indices, including two malaria prevalence indices and a protozoan diversity index, of which a large proportion is due to Plasmodium falciparum and P. vivax. There is a caveat in our data in that our malarial prevalence estimates are on a country-by-country basis for 21 countries, and of course reflect current prevalence levels rather than prevalence levels in the past that may have given rise to the different allele frequencies seen today. Nevertheless, alleles of different well-known genes that are likely to have undergone selection by malaria have been identified using this approach, such as GYPC (glycophorin C), ABO (ABO blood group), and SLC4A1 (erythrocyte membrane protein band). A recent study also suggests that an uncommon haplotype carrying the Hp2 allele shows some evidence of long extended haplotypes characteristic of recent natural selection, but in light of our data this seems to be a signal of selection at the HPR duplication or another nearby allele (Rodriguez et al. 2012).
There are several examples of CNV mediating different susceptibilities to infectious diseases (Hardwick et al. 2012; Mockenhaupt et al. 2004; Pelak et al. 2011). Despite the fact that infectious disease is likely to have been, and remains, a strong agent of natural selection on humans, detection of signatures of selection at copy number variable regions remains difficult, and typically relies on the identification of unusually high genetic differentiation between populations or continents (Hardwick et al. 2011; Iskow et al. 2012; Perry et al. 2007). Extended haplotype tests for selection can be used only when a particular copy number allele occurs on one haplotype. Here we investigate a possible example of natural selection for infectious disease resistance increasing the frequency of a copy number variant. The functional basis for this selection is well supported, and the other component of TLF-1, APOL1, also shows a similar sign of selection in west Africa (Genovese et al. 2010). However, our data are equivocal on the evidence for a selective advantage of the HPR duplication. This may be a real observation, or it may be that our analyses are underpowered because of small sample sizes, particularly in the context of a duplication allele frequency between 0.1 and 0.15. Further data from west and central African populations are required to fully characterise the patterns of selection in this genomic region, and a larger epidemiological study of HAT would also be an important future research avenue. HAT has had a profound impact on human and domestic animal evolution, and understanding its effect on genomes remains an important goal.
This work was funded by a Medical Research Council New Investigator Award GO801123 to E.J.H. We would like to thank Jenny Bowdrey for technical support, Mark Jobling for access to a ABI3130XL capillary electrophoresis platform and the DNA donors for supporting this work. Thanks are also due to Uberto Pozzoli for help with the FST calculations, and Pierpaolo Maisano Delser for discussions and help with file format conversion. This research used the ALICE High Performance Computing Facility at the University of Leicester.
- Atkinson SH, Mwangi TW, Uyoga SM, Ogada E, Macharia AW, Marsh K, Prentice AM, Williams TN (2007) The haptoglobin 2-2 genotype is associated with a reduced incidence of Plasmodium falciparum malaria in children on the coast of Kenya. Clin Infect Dis 44:802–809PubMedCentralPubMedCrossRefGoogle Scholar
- Cann HM, de Toma C, Cazes L, Legrand MF, Morel V, Piouffre L, Bodmer J, Bodmer WF, Bonne-Tamir B, Cambon-Thomsen A, Chen Z, Chu J, Carcassi C, Contu L, Du R, Excoffier L, Ferrara GB, Groot H, Gurwitz D, Jenkins T, Herrera RJ, Huang X, Kidd J, Kidd KK, Langaney A, Lin AA, Mehdi SQ, Parham P, Piazza A, Pistillo MP, Qian Y, Shu Q, Xu J, Zhu S, Weber JL, Greely HT, Feldman MW, Thomas G, Dausset J, Cavalli-Sforza LL (2002) A human genome diversity cell line panel. Science (New York, NY) 296:261CrossRefGoogle Scholar
- Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm C, Kristiansson K, Macarthur D, Macdonald J, Onyiah I, Pang A, Robson S, Stirrups K, Valsesia A, Walter K, Wei J, Consortium” WTCC, Tyler-Smith C, Carter N, Lee C, Scherer S, Hurles M (2009) Origins and functional impact of copy number variation in the human genome. Nature 464:704–712PubMedCentralPubMedCrossRefGoogle Scholar
- Genovese G, Friedman DJ, Ross MD, Lecordier L, Uzureau P, Freedman BI, Bowden DW, Langefeld CD, Oleksyk TK, Uskinski Knob AL, Bernhardy AJ, Hicks PJ, Nelson G, Vanhollebeke B, Winkler C, Kopp J, Pays E, MR P (2010) Association of trypanolytic ApoL1 variants with kidney disease in African Americans. Science 329:841–845PubMedCentralPubMedCrossRefGoogle Scholar
- Hardwick RJ, Machado LR, Zuccherato LW, Antolinos S, Xue Y, Shawa N, Gilman RH, Cabrera L, Berg DE, Tyler-Smith C, Tarazona-Santos E, Hollox EJ (2011) A worldwide analysis of beta-defensin copy number variation suggests recent selection of a high-expressing DEFB103 gene copy in East Asia. Hum Mutat 32:743–750PubMedCentralPubMedCrossRefGoogle Scholar
- Hardwick RJ, Amogne W, Mugusi S, Yimer G, Ngaimisi E, Habtewold A, Minzi O, Makonnen E, Janabi M, Machado LR, Viskaduraki M, Mugusi F, Aderaye G, Lindquist L, Hollox EJ, Aklillu E (2012) β-defensin Genomic Copy Number Is Associated With HIV Load and Immune Reconstitution in Sub-Saharan Africans. J Infect Dis 206:1012–1019PubMedCrossRefGoogle Scholar
- Jolly D, Prentice IC, Bonnefille R, Ballouche A, Bengo M, Brenac P, Buchet G, Burney D, Cazet JP, Cheddadi R, Edorh T, Elenga H, Elmoutaki S, Guiot J, Laarif F, Lamb H, Lezine A-M, Maley J, Mbenza M, Peyron O, Reille M, Reynaud-Ferrara I, Riollet G, Ritchie J, Roche E, Scott L, Ssemmanda I, Straka H, Umer M, Van Campo E, Vilimumbalo S, Vincens A, Waller M (2008) Biome reconstruction from pollen and plant macrofossil data for Africa and the Arabian peninsula at 0 and 6000 years. J Biogeogr 25:1007–1027CrossRefGoogle Scholar
- Ko W-Y, Rajan P, Gomez F, Scheinfeldt L, Froment A, Nyambo TB, Omar SA, Wambebe C, Ranciaro A, Hirbo JB, Tishkoff SA (2013) Identifying Darwinian Selection Acting on Different Human APOL1 Variants among Diverse African Populations. Am J Human Genet 93:54–66Google Scholar
- Pelak K, Need AC, Fellay J, Shianna KV, Feng S, Urban TJ, Ge D, De Luca A, Martinez-Picado J, Wolinsky SM, Martinson J, Jamieson B, Bream J, Martin M, Borrow P, McMichael A, Haynes B, Telenti A, Carrington M, Goldstein D, Alter G, Immunology NCfHAV (2011) Copy number variation of KIR genes influences HIV-1 control. PLoS Biology 9:e1001208PubMedCentralPubMedCrossRefGoogle Scholar
- Sabeti PC, Reich DE, Higgins JM, Levine HZP, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman H, Campbell S, Altshuler D, Cooper R, Kwiatkowski D, Ward R, Lander ES (2002) Detecting recent positive selection in the human genome from haplotype structure. Nature 419:832–837Google Scholar
- Smith AB, Esko JD, Hajduk SL (1995) Killing of trypanosomes by the human haptoglobin-related protein. Science-New York Then Washington 268:284-286Google Scholar
- Steverding D (2008) The history of African trypanosomiasis. Parasit Vectors 1Google Scholar
- Vanhollebeke B, De Muylder G, Nielsen MJ, Pays A, Tebabi P, Dieu M, Raes M, Moestrup SK, Pays E (2008) A haptoglobin-hemoglobin receptor conveys innate immunity to Trypanosoma brucei in humans. Sci Signal 320:677Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.