Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Hamajima, N
Right arrow Articles by Kurobe, Y
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hamajima, N
Right arrow Articles by Kurobe, Y
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Japanese Journal of Clinical Oncology Pages 490-493


Detection of Gene-Environment Interaction by Case-only Studies
Introduction
Materials And Methods
Results
   Hypothetical Data
   Examples from Published Studies
Discussion
References

Detection of Gene-Environment Interaction by Case-only Studies

Detection of Gene-Environment Interaction by Case-only Studies

Nobuyuki Hamajima1, Hidemichi Yuasa1,2, Keitaro Matsuo1,3 and Yohko Kurobe1

1Division of Epidemiology, Aichi Cancer Center Research Institute, Nagoya, 2Department of Oral and Maxillofacial Surgery, Nagoya City Jyouhoku Municipal Hospital, Nagoya and 3Nagoya University Graduate School of Medicine, Nagoya, Japan

Background: The detection of gene-environment interaction can provide important clues not only for resolving biological mechanisms underlying diseases, but also for disease prevention. The newly introduced case-only study was compared with traditional case-control study in terms of statistical power to detect significant gene-environment interaction.
Methods: Odds ratios for interaction were calculated in the framework of case-control study and case-only study separately, by an unconditonal logistic model. Hypothetical data with 200 cases and 200 or 400 controls and real published data derived from four cancer case-control studies of genotype and smoking were used for the comparisons.
Results: Although odds ratio estimates for interaction were the same, 95% confidence intervals were narrower in case-only studies than in case-control studies. Similarly, there were no substantial differences in point estimates for interaction in four real cancer case-control studies between the two study designs, but the confidence intervals were narrower with the case-only study.
Conclusions: Although the case-only study does not provide odds ratios for exposure or genotype alone, it is very useful for the detection of interaction, especially for screening purposes.

Key words: case-only study - case-control study - gene-environment interaction - odds ratio

INTRODUCTION

Molecular genetic technology has become available for examining joint effects of genetic traits and environmental exposures (gene-environment interactions) in the etiology of diseases. Traditional study designs, i.e. cohort study and case-control study, can be and indeed have been applied to detect interactions, but usually their power is not sufficient for detecting interactions (1). A new design, case-only study, has therefore been introduced to improve the power to identify interactions, with the assumption of independence between exposure and genotype (2-4). The logic is clear, but there is still skepticism among clinical researchers and even epidemiologists. This paper demonstrates advantages of case-only study over case-control study, using hypothetical and real reported data. It is hoped that the results will be convincing for clinical researchers, given the advantages of an inductive approach based on examples over a deductive strategy based on mathematics.

MATERIALS AND METHODS

Notations used for definitions of odds ratio (OR) and gene-environment interaction are shown in Table 1. Notations a1, b1, c1 and d1 are, respectively, numbers for cases without the genotype under study who were not exposed to the environmental factor under study, exposed cases without the genotype, unexposed cases with the genotype and exposed cases with the genotype. Notations a2, b2, c2 and d2 are numbers for corresponding controls. The relevant ORs are defined as shown in Table 1. Based on a multiplicative model, interaction is defined here as the ratio of the OR for the exposed divided by the OR for the unexposed or the OR of those with the genotype divided by the OR of those without, equal to (a1d1/b1c1)/(a2d2/b2c2), i.e. the OR among cases divided by the OR among controls. The latter OR is equal to 1 when exposure and genotype are independent among controls. In that case, the value for the interaction is equal to a1d1/b1c1. This means that we can obtain an estimate for interaction solely from cases. This is the logic underlying case-only studies (2-4).

Hypothetical data were generated as follows: prevalence of the exposure under study in the population (e) was set at 0.1 or 0.3 and the prevalence of the genotype under study in the population (g) at 0.1, 0.3 or 0.5. The OR for exposure (ORe) was 1 or 2, that for genotype (ORg) was 1 and that for interaction (ORi) was 2. Several sets of subjects were selected for demonstrating ORs and 95% confidence intervals, in such a way that point estimates became close to the above-adopted ORs.

Table 1. Notations used for definitions of odds ratios (OR) and an interaction
Exposure Genotype Cases Controls
No No a1 a2
Yes No b1 b2
No Yes c1 c2
Yes Yes d1 d2
OR for exposure:
   Among those without genotype a2b1/a1b2
    Among those with genotype c2d1/c1d2
OR for genotype:
    Among the unexposed a2c1/a1c2
    Among the exposed b2d1/b1d2
Interaction:
    (c2d1/c1d2)/(a2b1/a1b2)
   = (b2d1/b1d2)/(a2c1/a1c2)
    = (a1d1/b1c1)/(a2d2/b2c2)

Cancer case-control studies of genetic polymorphism and lifestyle were searched by MEDLINE and four papers that were found to list the numbers of subjects according to polymorphism and smoking habit (5-8) were used as examples. The polymorphism was classified into two categories, when more than two categories were used in the papers. Age and other confounding factors were not adjusted for, because the data were not available. This did not detract from our analysis, because the purpose was not to provide a real estimate of interaction of each study, but to compare estimates between two different study designs.

An unconditional logistic model was used for estimating ORs for exposure, genotype and gene-environment interaction. For case-control studies, two models were adopted, a two-parameter model of exposure and genotype and a three-parameter model of exposure, genotype and interaction. The ORs estimated using the two-parameter case-control design were termed marginal. The analysis was conducted with the SAS Logistic Procedure (9).

Table 2. Estimated odds ratios (OR) and 95% confidence intervals (95%CI) from hypothetical data according to the study design and model applied
Cases Controls Prevalence Case-control study Case-only study
a1 b1 c1 d1 a2 b2 c2 d2 e g Three-parameter model Two-parameter model ORi (95% CI)
ORe ORg ORi (95% CI) OR'e OR'g (95% CI)
Ncase = 200, Ncontrol = 200:
160 18 18 4 162 18 18 2 0.1 0.1 1.01 1.01 1.98 (0.28-13.8) 1.11 1.11 (0.58-2.10) 1.98 (0.60-6.48)
122 13 53 12 126 14 54 6 0.1 0.3 0.96 1.01 2.13 (0.57-7.93) 1.27 1.11 (0.73-1.70) 2.13 (0.91-4.96)
85 9 86 20 90 10 90 10 0.1 0.5 0.95 1.01 2.20 (0.63-7.67) 1.51 1.11 (0.75-1.64) 2.20 (0.95-5.10)
144 32 16 8 162 18 18 2 0.1 0.1 2.00 1.00 2.25 (0.37-13.6) 2.23 1.15 (0.61-2.18) 2.25 (0.89-5.71)
108 24 47 21 126 14 54 6 0.1 0.3 2.00 1.02 2.01 (0.60-6.78) 2.58 1.13 (0.74-1.74) 2.01 (1.02-3.96)
75 16 75 34 90 10 90 10 0.1 0.5 1.92 1.00 2.13 (0.68-6.67) 2.96 1.11 (0.75-1.67) 2.13 (1.08-4.17)
122 52 14 12 126 54 14 6 0.3 0.1 1.00 1.03 2.01 (0.54-7.45) 1.09 1.34 (0.72-2.48) 2.01 (0.87-4.64)
89 38 39 34 98 42 42 18 0.3 0.3 1.00 1.02 2.04 (0.84-4.97) 1.28 1.31 (0.86-2.00) 2.04 (1.13-3.71)
60 26 61 53 70 30 70 30 0.3 0.5 1.01 1.02 2.01 (0.86-4.67) 1.49 1.28 (0.86-1.91) 2.01 (1.11-3.61)
92 79 11 18 126 54 14 6 0.3 0.1 2.00 1.08 1.91 (0.52-6.94) 2.17 1.42 (0.77-2.64) 1.91 (0.85-4.28)
66 56 29 49 98 42 42 18 0.3 0.3 1.98 1.03 1.99 (0.83-4.80) 2.51 1.37 (0.89-2.10) 1.99 (1.11-3.56)
43 37 44 76 70 30 70 30 0.3 0.5 2.01 1.02 2.01 (0.87-4.63) 2.95 1.36 (0.91-2.06) 2.01 (1.13-3.57)
Ncase = 200, Ncontrol = 400:
160 18 18 4 324 36 36 4 0.1 0.1 1.01 1.01 1.98 (0.39-9.90) 1.11 1.11 (0.64-1.92) 1.98 (0.60-6.48)
122 13 53 12 252 28 108 12 0.1 0.3 0.96 1.01 2.13 (0.70-6.44) 1.28 1.12 (0.77-1.61) 2.13 (0.91-4.96)
85 9 86 20 180 20 180 20 0.1 0.5 0.95 1.01 2.20 (0.76-6.38) 1.51 1.11 (0.79-1.57) 2.20 (0.95-5.10)
144 32 16 8 324 36 36 4 0.1 0.1 2.00 1.00 2.25 (0.54-9.43) 2.23 1.17 (0.68-2.02) 2.25 (0.89-5.71)
108 24 47 21 252 28 108 12 0.1 0.3 2.00 1.02 2.01 (0.75-5.38) 2.58 1.15 (0.79-1.66) 2.01 (1.02-3.96)
75 16 75 34 180 20 180 20 0.1 0.5 1.92 1.00 2.13 (0.83-5.44) 2.97 1.13 (0.80-1.61) 2.13 (1.08-4.17)
122 52 14 12 252 108 28 12 0.3 0.1 1.00 1.03 2.01 (0.67-6.04) 1.09 1.34 (0.79-2.27) 2.01 (0.87-4.64)
89 38 39 34 196 84 84 36 0.3 0.3 1.00 1.02 2.04 (0.96-4.35) 1.29 1.32 (0.92-1.89) 2.04 (1.13-3.71)
60 26 61 53 140 60 140 60 0.3 0.5 1.01 1.02 2.01 (0.97-4.15) 1.50 1.30 (0.92-1.83) 2.01 (1.11-3.61)
92 79 11 18 252 108 28 12 0.3 0.1 2.00 1.08 1.91 (0.65-5.60) 2.17 1.45 (0.86-2.44) 1.91 (0.85-4.28)
66 56 29 49 196 84 84 36 0.3 0.3 1.98 1.03 1.99 (0.95-4.20) 2.53 1.41 (0.98-2.02) 1.99 (1.11-3.56)
43 37 44 76 140 60 140 60 0.3 0.5 2.01 1.02 2.01 (0.98-4.11) 2.97 1.41 (0.99-2.01) 2.01 (1.13-3.57)
Ncase, number of cases; Ncontrol, number of controls; a1, b1, c1, d1, a2, b2, c2 and d2, see Table 1; e, prevalence of exposure under study; g, prevalence of genotype under study; ORe, odds ratio of exposure among those without genotype; ORg, odds ratio of genotype among the unexposed; ORi, odds ratio of interaction; 95%CI, 95% confidence interval; OR'e,marginal odds ratio for exposure; OR'g,merginal odds ratio for genotype.

Table 3. Estimates for odds ratios (OR) and 95% confidence intervals (95% CI) for published cancer case-control studies
First author* Cases Controls Case-control study Case-only study
a1 b1 c1 d1 a2 b2 c2 d2 Three-parameter model Two-parameter model ORi (95% CI)
ORe ORg ORi (95% CI) OR`e OR'g (95% CI)
Hildesheim 152 185 22 5 166 143 3 6 1.41 8.01 0.08 (0.01-0.45) 1.29 2.91 (1.34-6.30) 0.19 (0.07-0.51)
Wu 1 15 7 114 13 29 73 91 6.72 1.25 1.94 (0.20-19.0) 12.2 2.31 (1.20-4.45) 1.09 (0.13-9.45)
Taylor 17 92 22 99 30 64 44 65 2.54 0.88 1.20 (0.49-2.96) 2.81 1.01 (0.69-1.49) 0.83 (0.42-1.66)
Sigimura 13 206 1 27 69 109 3 4 10.0 1.77 2.02 (0.15-26.5) 10.5 3.14 (1.24-7.94) 1.70 (0.21-13.5)
*Hildesheim et al. (5), nasopharyngeal cancer in Taiwan, CYP2E1 and smoking; Wu et al. (6), lung cancer in the United States, CYP2E1 and smoking; Taylor et al. (7), bladder cancer in the United States, NAT2 and smoking; Sugimura et al. (8), lung cancer in Japan, CYP1A1 and smoking. Abbreviations are the same as in Table 2.

RESULTS

Hypothetical Data

Notations a1, b1, c1, d1, a2, b2, c2 and d2 in Table 2 show numbers of cases and controls according to the exposure and genotype definitions in Table 1. Two groups of data are listed here, one for 200 cases and 200 controls and the other for 200 cases and 400 controls. With the former, no listed combination showed any statistically significant interaction in the three-parameter case-control study. The two-parameter case-control study gave an elevated point estimate for marginal OR'g, but was not significant. These case-control analyses would therefore conclude that genotype was not significantly associated with the risk of disease.

Analysis with the case-only study gave the same ORs, but significance was achieved for some of the datasets. As shown in Table 2, a high prevalence of exposure and genotype in the population was more likely to demonstrate significant interaction. In such combinations, the analysis based on the case-only study was found to be fruitful, although it did not provide ORs for exposure or genotype alone. The statistical power was compared between case-control and case-only studies by means of the standard error of log(ORi). The standard errors obtained in the case-only study were 0.52 to 0.70 of those in the case-control study with 200 cases and 200 controls in Table 2.

With 400 controls, no differences were observed except that the 95% confidence intervals again became narrower. The standard errors of log(ORi) in the case-control study with 200 cases and 400 controls were 1.06-1.54 times larger than in the case-only study. For the purpose of detecting interaction, the analysis based on the case-only study was found to be more powerful.

Examples from Published Studies

Comparisons between case-control and case-only studies were conducted for four reported cancer studies on polymorphism and smoking (Table 3). There were no substantial differences in the point estimates for interaction between both studies and their 95% confidence intervals overlapped. However, the standard error of log(ORi) was smaller with the case-only study. In these data sets, there were no associations between the polymorphism and smoking (p > 0.1).

DISCUSSION

Epidemiological studies have identified many risk factors for cancers from records and/or questionnaires (10) and in several cases, including smoking, some dietary components, hazardous agents, reproductive history and family history, the link could be appreciated only on the basis of a document-based approach. The next step is to identify those individuals sensitive to environmental factors, i.e. to examine gene-environment interactions. Molecular genetic technology has opened the door for researchers to a new field of research into interactions between genetic traits and exposure, constantly expanding with increase in genes and their polymorphisms identified. New study designs are clearly necessary to cope with this situation.

It is not always necessary to examine the contribution of well established risk factors such as smoking when an interaction is the main focus of study, since this may not help with the elucidation of biological mechanisms and cancer prevention. When associations with exposure or genotype are to be measured separately in the same dataset studied for interaction, an alternative incomplete-data case-control design is applicable, where data on exposure only or genotype only are measured among controls (11).

As shown in this paper, case-only studies provide narrower confidence intervals than their case-control counterparts. The standard errors of log(ORi) in our hypothetical case-only study with 200 cases were 0.52-0.70 times those of a case-control study with the same number of cases and controls. Thus, the case-control study requires twice the number of subjects to obtain half to two thirds of the precision in the estimate. Comparisons between the case-only study with 200 cases and case-control study with 200 cases and 400 controls showed that threefold subjects of the case-control study gave a lower precision in the estimate. Similar findings have been reported with regard to sample size calculation for interaction detection; a case-only study requires fewer cases than the number of cases in the case-control studies with two controls per case (1).

A case-only study provides valid estimates only on the assumption that genotype and exposure are independently distributed in the study population. If they are associated, the odds ratio for interaction derived from a case-only study would be biased depending on the strength of the association. However, the association, e.g. between ALDH2 genotype and alcohol drinking, seems to be the exception rather than the rule. In four studies used as examples in this paper, no significant association was observed between the polymorphism and smoking habit. If an association was found, it would be of major interest in terms of biological mechanism and disease prevention.

In case-control studies concerning genetic traits and diseases, there is uncorrectable confounding due to `population stratification' or `population admixture' (4,12). Some subgroups may have different patterns of genotype frequency, whose members have a high risk condition due to lifestyle and/or other genes. When a genotype frequent in the subgroup is studied in the population including the subgroup, a spurious association may be observed between the genotype and disease in case-control studies. This is analogous to case-control studies without adjustment for age and gender. It is easy to imagine that confounding will occur even if the controls are sampled randomly from persons without the disease under study.

In conclusion, the present investigation demonstrated that case-only study is powerful with regard to the detection of interactions. The approach will become more important as genes and polymorphisms are identified at exponentially increasing speed and it appears particularly suitable for screening purposes. Understanding its advantages and disadvantages is now essential for clinical researchers, epidemiologists and all those health professionals engaged in disease prevention.

References

1. Yang Q, Khoury M, Flanders WD. Sample size requirements in case-only designs to detect gene-environment interaction. Am J Epidemiol 1997;146:713-20. MEDLINE Abstract

2. Khoury MJ, Flanders WD. Nontraditional epidemiologic approaches in the analysis of gene-environment interaction: case-control studies with no controls! Am J Epidemiol 1996;144:207-13. MEDLINE Abstract

3. Yang Q, Khoury MJ. Evolving methods in genetic epidemiology. III. Gene-environment interaction in epidemiologic research. Epidemiol Rev 1997;19:33-43. MEDLINE Abstract

4. Andrieu N, Goldstein AM. Epidemiologic and genetic approaches in the study of gene-environment interaction: an overview of available methods. Epidemiol Rev 1998;20:137-47. MEDLINE Abstract

5. Hildesheim A, Anderson LM, Chen C-J, Cheng Y-J, Brinton LA, Daly AK, et al. CYP2E1 genetic polymorphisms and risk of nasopharyngeal carcinoma in Taiwan. J Natl Cancer Inst 1997;89:1207-12. MEDLINE Abstract

6. Wu X, Shi H, Jiang H, Kemp B, Hong WK, Delclos GL, et al. Associations between cytochrome P4502E1 genotype, mutagen sensitivity, cigarette smoking and susceptibility to lung cancer. Carcinogenesis 1997;18:967-73. MEDLINE Abstract

7. Taylor JA, Umbach DM, Stephens E, Castronio T, Paulson D, Robertson C, et al. The role of N-acetylation polymorphisms in smoking-associated bladder cancer: evidence of a gene-gene-exposure three-way interaction. Cancer Res 1998;58:3603-10. MEDLINE Abstract

8. Sugimura H, Wakai K, Genka K, Nagura K, Igarashi H, Nagayama K. Association of Ile462Val (Exon 7) polymorphism of cytochrome P450 IA1 with lung cancer in the Asian population: further evidence from a case-control study in Okinawa. Cancer Epidemiol Biomarkers Prev 1998;7:413-17. MEDLINE Abstract

9. SAS/STAT User's Guide, Version 6. Cary, NC: SAS Institute 1990.

10. Schottenfeld D, Fraumeni JF Jr, editors. Cancer Epidemiology and Prevention. Philadelphia: Saunders 1982.

11. Umbach DM, Weinberg CR. Designing and analysing case-control studies to exploit independence of genotype and exposure. Stat Med 1997;16:1731-43. MEDLINE Abstract

12. Lander ES, Schork NJ. Genetic dissection of complex traits. Science 1994;265:2037-48. MEDLINE Abstract


Received May 24, 1999; accepted July 5, 1999
For reprints and all correspondence: Nobuyuki Hamajima, Division of Epidemiology, Aichi Cancer Center Research Institute, 1-1 Kanokoden, Chikusa-ku, Nagoya 464-8681, Japan. E-mail: nhamajim{at}aichi-cc.pref.aichi.jp


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: jnl.info{at}oup.co.uk
Last modification: 30 Nov 1999
Copyright© 1999 Foundation for Promotion of Cancer Research.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Am J EpidemiolHome page
L.-Y. Wang and W.-C. Lee
Population Stratification Bias in the Case-Only Study for Gene-Environment Interactions
Am. J. Epidemiol., July 15, 2008; 168(2): 197 - 201.
[Abstract] [Full Text] [PDF]


Home page
CarcinogenesisHome page
C.-X. Yang, K. Matsuo, H. Ito, M. Shinoda, S. Hatooka, K. Hirose, K. Wakai, T. Saito, T. Suzuki, T. Maeda, et al.
Gene-environment interactions between alcohol drinking and the MTHFR C677T polymorphism impact on esophageal cancer risk: results of a case-control study in Japan
Carcinogenesis, July 1, 2005; 26(7): 1285 - 1290.
[Abstract] [Full Text] [PDF]


Home page
Journals of Gerontology Series A: Biological Sciences and Medical SciencesHome page
Q. Tan, A. I. Yashin, E. M. Bladbjerg, M. P.M. de Maat, K. Andersen-Ranberg, B. Jeune, K. Christensen, and J. W. Vaupel
A Case-Only Approach for Assessing Gene by Sex Interaction in Human Longevity
J. Gerontol. A Biol. Sci. Med. Sci., April 1, 2002; 57(4): B129 - 133.
[Abstract] [Full Text] [PDF]


Home page
Arch Pediatr Adolesc MedHome page
C. A. Hobbs, M. A. Cleves, and C. J. Simmons
Genetic Epidemiology and Congenital Malformations: From the Chromosome to the Crib
Arch Pediatr Adolesc Med, April 1, 2002; 156(4): 315 - 320.
[Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (13)
Right arrow Request Permissions
Google Scholar
Right arrow Articles by Hamajima, N
Right arrow Articles by Kurobe, Y
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hamajima, N
Right arrow Articles by Kurobe, Y
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?