REFERENCES

data mining: opportunities and challenges
Chapter VII - The Impact of Missing Data on Data Mining
Data Mining: Opportunities and Challenges
by John Wang (ed) 
Idea Group Publishing 2003
Brought to you by Team-Fly

Adriaans, P. & Zantinge, D. (1997). Data mining. New York: Addison-Wesley.

Afifi, A. & Elashoff, R. (1966). Missing observations in multivariate statistics, I: Review of the literature. Journal of the American Statistical Association, 61:595 604.

Agrawal, R., Imielinski, T., Swami, & A. (1993). Mining associations between sets of items in massive databases. Proceedings of the ACM SIGMOD International Conference on Management of Data, Washington, DC, 207 216.

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. (1995). Fast discovery of association rules. Advances in Knowledge Discovery and Data Mining, Chapter 12. Cambridge, MA: AAAI/MIT Press.

Agrawal, R. & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Databases, Santiago de Chile, Chile: Morgan Kaufmann.

Barnard, J., & Meng, X. (1999). Applications of multiple imputation in medical studies: From AIDS to NHANES. Statistical Methods in Medical Research, 8, 17 36.

Berry, M. & Linoff, G. (1997). Data mining techniques. New York: Wiley.

Berson, A., Smith, S., & Thearling, K. (2000). Building data mining applications for CRM. New York: McGraw-Hill.

Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Monterey, CA: Wadsworth & Brooks/Cole Advanced Books & Software.

Brick, J. M. & Kalton, G. (1996). Handling missing data in survey research. Statistical Methods in Medical Research, 5, 215 238.

Chung, H. M. & Gray, P. (1999). Special section: Data mining. Journal of Management Information Systems, 16(1): 11 17.

Clogg, C., Rubin, D., Schenker, N., Schultz, B., Weidman, L. & (1991). Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression. Journal of the American Statistical Association, 86, 413, 68 78.

Cohen, J., and Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences, 2nd ed. Hillsdale, NJ: Lawrence Erlbaum Associates.

Darling, C. B. (1997). Data mining for the masses, Datamation, 52(5).

David, M., Little, R., Samuhel, M., & Triest, R. (1986). Alternative methods for CPS income imputation. Journal of the American Statistical Association, 81, 29 41.

Dempster, A., Laird, N., & Rubin, D. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, B39, 1 38.

Dempster, A. & Rubin, D. (1983). Overview. In W. G. Madow, I. Olkin, & D. Rubin (eds.), Incomplete data in sample surveys, Vol. II: Theory and annotated bibliography. pp.3 10. New York: Academic Press.

Diggle, P. & Kenward, M. (1994). Informative dropout in longitudinal data analysis (with discussion). Applied Statistics, 43, 49 94.

Dillman, D. A. (1999). Mail and Internet surveys: The tailored design method. New York, NY: John Wiley & Sons.

Ernst, L. (1980). Variance of the estimated mean for several imputation procedures. In the Proceedings of the Survey Research Methods Section, Alexandria, VA: American Statistical Association, 716 720.

Fayyad, U., Piatetsky-Shapiro, G. & Smyth, P. (1996). The KDD process for extracting useful knowledge from volumes of data, Communications of the ACM, 39(11): 27 34, November.

Flockhart, I. & Radcliffe, N. (1996). A genetic algorithm-based approach to data mining., Proceedings of the ACM SIGMOD International Conference on Management of Data, 299 302.

Ford, B. (1981). An overview of hot deck procedures. In W. G. Madow, I. Olkin, & D. Rubin (eds.), Incomplete data in sample surveys, Vol. II: Theory and annotated bibliography New York: Academic Press.

Ghahramani, Z. & Jordan, M. (1997). Mixture models for learning form incomplete data. In J. Cowan, G. Tesauro, & J. Alspector (eds.), Advances in Neural information processing systems 6. pp.120 127. San Mateo, CA: Morgan Kaufmann. 120 127.

Graham, J., Hofer, S., Donaldson, S., MacKinnon, D., & Schafer, J. (1997). Analysis with missing data in prevention research. In K. Bryant, W. Windle, & S. West (eds.), New methodological approaches to alcohol prevention research. Washington, DC: American Psychological Association.

Graham, J., Hofer, S., & Piccinin, A. (1994). Analysis with missing data in drug prevention research. In L. M. Collins & L. Seitz (eds.), Advances in data analysis for prevention intervention research. NIDA Research Monograph, Series (#142). Washington, DC: National Institute on Drug Abuse.

Groth, R. (2000). Data mining: Building competitive advantage. Upper Saddle River, NJ: PrenticeHall.

Hair, J., Anderson, R., Tatham, R., & Black, W. (1998), Multivariate data analysis. Upper Saddle River, NJ: PrenticeHall.

Han, J. & Kamber, M. (2001). Data mining: Concepts and techniques. San Francisco: Academic Press.

Hansen, M., Madow, W., & Tepping, J. (1983). An evaluation of model-dependent and probability-sampling inferences in sample surveys. Journal of the American Statistical Association, 78, 776 807.

Hartley, H. & Hocking, R. (1971). The analysis of incomplete data. Biometrics, 27, 783 808.

Haykin, S. (1994). Neural networks: A comprehensive foundation. New York: Macmillan Publishing.

Heitjan, D.F. (1997). Annotation: What can be done about missing data? Approaches to imputation. American Journal of Public Health, 87(4), 548 550.

Herzog, T., & Rubin, D. (1983). Using multiple imputations to handle nonresponse in sample surveys. In G. Madow, I. Olkin, & D. Rubin (eds.), Incomplete data in sample surveys, Volume 2: Theory and bibliography. pp. 209 245. New York: Academic Press.

Holland, J. (1975). Adaptation in natural and artificial systems. Ann Arbor, MI: University of Michigan Press.

Howell, D.C. (1998). Treatment of missing data [Online]. Available http://www.uvm.edu/~dhowell/StatPages/More_Stuff/Missing_Data/Missing.html [2001, September 1].

Iannacchione, V. (1982). Weighted sequential hot deck imputation macros. In the Proceedings of the SAS Users Group International Conference, San Francisco, CA, 7, 759 763.

Jenkins, C. R., Dillman, D. A. and (1997). Towards a theory of self-administered questionnaire design. In E. Lyberg, P. Bierner, M. Collins, D. Leeuw, C. Dippo, N. Schwartz & D. Trewin (Eds.), Survey measurement and process quality. New York: John Wiley & Sons.

Kalton, G. & Kasprzyk, D. (1982). Imputing for missing survey responses. In the Proceedings of the Section on Survey Research Methods, Alexandria, VA: American Statistical Association, pp.22 31.

Kalton, G. & Kasprzyk, D. (1986). The treatment of missing survey data Survey Methodology, 12: 1 16.

Kalton, G. & Kish, L. (1981). Two efficient random imputation procedures. In the Proceedings of the Survey Research Methods Section, Alexandria, VA: American Statistical Association, pp.146 151.

Kass, G. (1980). An exploratory technique for investigating large quantities of categorical data., Applied Statistics, 29, 119 127.

Kim, Y. (2001). The curse of the missing data [Online]. Available http://209.68.240.11:8080/2ndMoment/978476655/addPostingForm [2001, September 1].

Li, K. (1985). Hypothesis testing in multiple imputation - With emphasis on mixed-up frequencies in contingency tables, Ph.D.Thesis, The University of Chicago, Chicago, IL.

Little, R. (1982). Models for nonresponse in sample surveys. Journal of the American Statistical Association, 77, 237 250.

Little, R. (1992). Regression with missing X's: A review. Journal of the American Statistical Association, 87, 1227 1237.

Little, R. (1995). Modeling the drop-out mechanism in repeated-measures studies. Journal of the American Statistical Association, 90, 1112 1121.

Little, R. & Rubin, D. (1987). Statistical analysis with missing data. New York: Wiley.

Little, R. & Rubin, D. (1989). The analysis of social science data with missing values. Sociological Methods and Research, 18, 292 326.

Loh, W. & Shih, Y. (1997). Split selection methods for classification trees. Statistica Sinica, 7, 815 840.

Loh, W. & Vanichestakul, N. (1988). Tree-structured classification via generalized discriminant analysis (with discussion). Journal of the American Statistical Association 83, 715 728.

Masters, T. (1995). Neural, novel, and hybrid algorithms for time series predictions. New York: Wiley.

Michalewicz, Z. (1994). Genetic algorithms + data structures = evolution programs. New York: Springer-Verlag.

Morgan, J. & Messenger, R. (1973). THAID: A sequential analysis program for the analysis of nominal scale dependent variables. Technical report, Institute of Social Research, University of Michigan, Ann Arbor, MI.

Morgan, J. & Sonquist, J. (1973). Problems in the analysis of survey data and a proposal. Journal of the American Statistical Association, 58, 415 434.

Nie, N., Hull, C., Jenkins, J., Steinbrenner, K., & Bent, D. (1975). SPSS, 2nd ed. New York: McGraw-Hill.

Orchard, T. & Woodbury, M. (1972). A missing information principle: Theory and applications. In the Proceedings of the 6th Berkeley Symposium on Mathematical Statistics and Probability, University of California, Berkeley, CA, 1, 697 715.

Pennell, S. (1993). Cross-sectional imputation and longitudinal editing procedures in the survey of income and program participation. Technical report, Institute of Social Research, University of Michigan, Ann Arbor, MI.

Ripley, B. (1996). Pattern recognition and neural networks. Cambridge, UK: Cambridge University Press.

Roth, P. (1994). Missing data: A conceptual review for applied psychologists. Personnel Psychology, 47, 537 560.

Royall, R. & Herson, J. (1973). Robust estimation from finite populations. Journal of the American Statistical Association. 68, 883 889.

Rubin, D. (1978). Multiple imputations in sample surveys - A phenomenological Bayesian approach to nonresponse, Imputation and Editing of Faulty or Missing Survey Data, U.S. Department of Commerce, 1 23.

Rubin, D. (1986). Statistical matching using file concatenation with adjusted weights and multiple imputations. Journal of Business and Economic Statistics, 4, 87 94.

Rubin, D. (1996). Multiple imputation after 18+ years (with discussion. Journal of the American Statistical Association, 91, 473 489.

Rubin, D. & Schenker, N. (1986). Multiple imputation for interval estimation from simple random sample with ignorable nonresponse, Journal of the American Statistical Association, 81, 366 374.

Sande, L. (1982). Imputation in surveys: Coping with reality. The American Statistician, Vol. 36, 145 152.

Sande, L. (1983). Hot-deck imputation procedures. In W.G. Madow & I. Olkin (eds.), Incomplete data in sample surveys, Vol. 3, Proceedings of the Symposium on Incomplete Data: Panel on Incomplete Data, Committee of National Statistics, Commission on Behavioral and Social Sciences and Education, National Research Council, Washington, D. C., 1979, August 10 11, pp. 339 349. New York: Academic Press.

Schafer, J. (1997). Analysis of incomplete multivariate data. London: Chapman and Hall.

Schafer, J. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3 15.

Schafer, J. & Olsen, M. (1998). Multiple imputation for multivariate missing-data problems: A data analyst's perspective. Multivariate Behavioral Research, 33, 545 571.

Sharpe, P. & Glover, R. (1999). Efficient GA based techniques for classification, Applied Intelligence, 11, 3, 277 284.

Skapura, D. (1995). Building neural networks. New York: Addison Wesley.

Statistical Services of University of Texas (2000). General FAQ #25: Handling missing or incomplete data [Online]. Available http://www.utexas.edu/cc/faqs/stat/general/gen25.html. [2001, September 1].

Sullivan, D. (2001). Document warehousing and text mining. New York: John Wiley & Sons.

Szpiro, G. (1997). A search for hidden relationships: Data mining with genetic algorithms, Computational Economics, 10, 3, 267 277.

Tresp, V., Neuneier, R., & Ahmad, S. (1995). Efficient methods for dealing with missing data in supervised learning. In G. Tesauro, D. Touretzky, and T. Keen (eds.), Advances in neural information processing systems 7. pp. 689 696. Cambridge, MA: The MIT Press.

van Buren, S., Boshuizen, H., & Knook, D. (1999). Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine, 18, 681 694.

Warner, B., & Misra, M. (1996). Understanding neural networks as statistical tools. The American Statistician, 50, 284 293.

Westphal, C. & Blaxton, T. (1998). Data mining solutions. New York: Wiley.

Witten, I. & Frank, E. (2000). Data mining. San Francisco: Academic Press.

Wothke, W. (1998). Longitudinal and multi-group modeling with missing data. In T.D. Little, K.U. Schnabel, & J. Baumert (eds.), Modelling longitudinal and multiple group data: Practical issues, applied approaches and specific examples. Mahwah, NJ: Lawrence Erlbaum Associates.

Brought to you by Team-Fly


Data Mining(c) Opportunities and Challenges
Data Mining: Opportunities and Challenges
ISBN: 1591400511
EAN: 2147483647
Year: 2003
Pages: 194
Authors: John Wang

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net