GUIDE Classification and Regression Trees and Forests (version 42.6)

© Wei-Yin Loh 1997-2024

Photo by Haoyang Fan, Xu He, Dong Liu, and Wenwen Zhang, taken on Sentosa Island, Singapore, March 22, 2014

GUIDE is a multi-purpose machine learning algorithm for constructing classification and regression trees and forests. It is designed and maintained by Wei-Yin Loh at the University of Wisconsin, Madison. GUIDE stands for Generalized, Unbiased, Interaction Detection and Estimation.

Development of GUIDE is supported in part by research grants from the U.S. Army Research Office, National Science Foundation, National Institutes of Health, Bureau of Labor Statistics, USDA Economic Research Service, and Eli Lilly. Work on precursors of GUIDE was additionally supported by IBM and Pfizer.

Video lectures:

  1. Six-hour tutorial delivered at the Institute of Mathematical Sciences (Singapore), December 2021 (pdf slides here)
  2. Eight-hour lectures delivered at Cambridge (UK), May 2022

GUIDE properties and features:

  1. Choice of classification or regression trees
  2. Negligible bias in split variable selection
  3. Importance ranking and identification of unimportant variables
  4. Power to detect local interactions between pairs of predictor variables
  5. Ability to use ordered (continuous) and unordered (categorical) predictor variables
  6. Automatic handling of missing values, including splits on missingness
  7. Automatic prediction for new (unseen) samples
  8. Choice of weighted least squares (Gaussian), least median of squares, Poisson, quantile (including median), proportional hazards, or multi-response (e.g., longtudinal) regression tree models
  9. Choice of piecewise constant, best simple polynomial, multiple, or stepwise linear regression models
  10. Choice of roles for predictor variables (splitting only, node modeling only, both, or none)
  11. Choice of using categorical variables for splitting only or both splitting and fitting through dummy 0-1 vectors (ANCOVA)
  12. Choice of stopping rules: no pruning, pruning by cross-validation, or pruning with a test sample
  13. Choice of batch or interactive mode of operation
  14. On-the-fly generation of products and powers of predictor variables as regressor variables
  15. Generation of LaTeX ( MikTeX for Windows) source code for the tree diagrams in PostScript (.ps) format. The LaTeX code requires the PSTricks package which is normally included in most LaTeX distributions. See PSTricks User Guide and TUG India doc (especially Chapter 11) for some excellent documentation on PSTricks. The .ps files may be converted to pdf with ps2pdf (which comes with Ghostscript) or to enhanced windows meta file (.emf) with pstoedit. Emf format is best for use in Word and PowerPoint documents. For a short introduction to LaTeX, look here.
  16. Generation of R source code for prediction of future cases
  17. Free executables for Windows, Mac, and Linux (see below)

See Table 1 for a feature comparison between GUIDE and other classification tree algorithms.

See Table 2 for a feature comparison between GUIDE and other regression tree algorithms.

Documentation:

  1. Baker, T. B., Loh, W.-Y., et al. (2023), A machine learning analysis of correlates of mortality among patients hospitalized with COVID-19. Scientific Reports, vol. 13, 4080. [An application of GUIDE.]
  2. Loh, W.-Y. (2023), Logistic regression tree analysis. In Springer Handbook of Engineering Statistics, 2nd ed., H. Pham, (Ed.), 593-604.
  3. Loh, W.-Y. and Zhou, P. (2021), Variable importance scores. Journal of Data Science, vol. 19, 4, 569-592.
  4. Loh, W.-Y., Zhang, Q., Zhang, W. and Zhou, P. (2020), Missing data, imputation and regression trees, Statistica Sinica, vol. 30, 1697-1722.
  5. Loh, W.-Y. and Zhou, P. (2020), The GUIDE approach to subgroup identification. In Design and Analysis of Subgroups with Biopharmaceutical Applications, N. Ting, J. C. Cappelleri, S. Ho, and D.-G. Chen (Eds.) Springer, pp. 147-165.
  6. Loh, W.-Y., Cao, L. and Zhou, P. (2019), Subgroup identification for precision medicine: a comparative review of thirteen methods, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol. 9, 5, e1326. DOI
  7. Loh, W.-Y., Eltinge, J., Cho, M. and Li, Y. (2019), Classification and regression trees and forests for incomplete data from sample surveys, Statistica Sinica, vol. 29, 431-453. DOI
  8. Loh, W.-Y., Man, M. and Wang, S. (2019), Subgroups from regression trees with adjustment for prognostic effects and post-selection inference, Statistics in Medicine, vol. 38, 545-557. DOI
  9. Loh, W.-Y., Fu, H., Man, M., Champion, V. and Yu, M. (2016), Identification of subgroups with differential treatment effects for longitudinal and multiresponse variables, Statistics in Medicine, vol. 35, 4837-4855. DOI
  10. Loh, W.-Y., He, X., and Man, M. (2015), A regression tree approach to identifying subgroups with differential treatment effects, Statistics in Medicine, vol. 34, 1818-1833. DOI
  11. Loh, W.-Y. (2014), Fifty years of classification and regression trees (with discussion), International Statistical Review, vol. 34, 329-370. DOI
  12. Loh, W.-Y. and Zheng, W. (2013), Regression trees for longitudinal and multiresponse data, Annals of Applied Statistics, vol. 7, 496-522. DOI
  13. Loh, W.-Y. (2012), Variable selection for classification and regression in large p, small n problems, Lecture Notes in Statistics---Proceedings, A. Barbour, H.P. Chan and D. Siegmund (Eds.), vol 205, Springer, pp. 133--157.
  14. Loh, W.-Y. (2011), Classification and regression trees, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, vol.1, 14-23. DOI
  15. Loh, W.-Y. (2010), Tree-structured classifiers, Wiley Interdisciplinary Reviews: Computational Statistics, vol.2, 364-369. DOI
  16. Loh, W.-Y. (2009), Improving the precision of classification trees, Annals of Applied Statistics, vol. 3, 1710-1737. DOI [The definitive reference for GUIDE classification.]
  17. Loh, W.-Y. (2008), Classification and regression tree methods, Encyclopedia of Statistics in Quality and Reliability, F. Ruggeri, R. Kenett, and F. W. Faltin (Eds.) Wiley, pp. 315-323.
  18. Loh, W.-Y. (2008), Regression by parts: Fitting visually interpretable models with GUIDE, Handbook of Computational Statistics, vol. III , 447-469, Springer.
  19. Loh, W.-Y., Chen, C.-W., and Zheng, W.(2007), Extrapolation errors in linear model trees, ACM Transactions on Knowledge Discovery in Data, vol. 1, issue 2, article 6. DOI.
  20. Kim, H., Loh, W.-Y., Shih, Y.-S., and Chaudhuri, P. (2007), Visualizable and interpretable regression models with good prediction power , IIE Transactions, vol. 39, Issue 6, June 2007, pp. 565-579. DOI. Datasets
  21. Loh, W.-Y. (2006), Regression tree models for designed experiments, Second Lehmann Symposium, Institute of Mathematical Statistics Lecture Notes-Monograph Series, vol. 49, 210-228.
  22. Loh, W.-Y. (2002), Regression trees with unbiased variable selection and interaction detection, Statistica Sinica, vol. 12, 361-386. [The definitive reference for GUIDE regression.]
  23. Chaudhuri, P. and Loh, W.-Y. (2002), Nonparametric estimation of conditional quantiles using quantile regression trees, Bernoulli, vol. 8, 561-576.
  24. Chaudhuri, P., Lo, W.-D., Loh, W.-Y., and Yang, C.-C. (1995), Generalized regression trees, Statistica Sinica, vol. 5, 641-666.
  25. Chaudhuri, P., Huang, M.-C., Loh, W.-Y., and Yao, R. (1994), Piecewise-polynomial regression trees, Statistica Sinica, vol. 4, 143-167.
  26. Loh, W.-Y., and Vanichsetakul, N. (1988), Tree-structured classification via generalized discriminant analysis (with discussion), Journal of the American Statistical Association, vol. 83, 715-728. [I began my journey with this article.]
  • (Mostly) third-party applications of GUIDE, QUEST, CRUISE, and LOTUS: Look here
  • GUIDE compiled binaries: The following executable files may be freely distributed but not sold for profit.
  • guide.gz for 64-bit Linux (compiled with Intel Fortran compiler 18.0.1, Ubuntu 22.04.3). Puts scratch files in TMPDIR if the environment variable is defined, otherwise in the current folder.
  • guide.gz for 64-bit Linux (compiled with gfortran 11.4.0, Ubuntu 22.04.3 LTS). Puts scratch files in TMPDIR if the environment variable is defined, otherwise in /tmp.
  • guide.gz for 64-bit Linux (compiled with gfortran 7.5.0, Ubuntu 18.04.6 LTS). Puts scratch files in TMPDIR if the environment variable is defined, otherwise in /tmp.
  • guide.gz for macOS Sonoma 14.4.1 with Apple Arm processors (compiled with NAG Fortran 7.1)
  • guide.gz for macOS Monterey 12.7.4 with Intel processors (compiled with NAG Fortran 6.2)
  • guide.gz for macOS Monterey 12.7.4 with Intel processors (compiled with gfortran 12.1 and Xcode 14.2--see manual)
  • guide.gz for macOS Big Sur 11.7.10 (compiled with gfortran 11.2 and Xcode 13.2.1--see manual)
  • guide.zip for 64-bit Windows 10, compiled with gfortran 11.2.0. Puts scratch files in home folder.
  • guide.gz (requires Windows Subsystem for Linux) for Windows 11, compiled with gfortran 11.4.0 and Ubuntu 22.04
  • GUIDE manual: guideman.pdf
  • Data files used in manual: datafiles.zip (Mac), datafiles.zip (Windows) or datafiles.tar.gz (Linux)
  • GUIDE revision history: history.txt
  • Earlier algorithms developed by Wei-Yin Loh and his students:

  • QUEST: Binary classification tree
  • CRUISE: Classification tree that splits each node into two or more subnodes
  • LOTUS: Logistic regression tree
  • License:

    Copyright (c) 1997-2024 Wei-Yin Loh. All rights reserved.

    Redistribution and use in binary forms, with or without modification, are permitted provided that the following condition is met:

    Redistributions in binary form must reproduce the above copyright notice, this condition and the following disclaimer in the documentation and/or other materials provided with the distribution.

    THIS SOFTWARE IS PROVIDED BY WEI-YIN LOH "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL WEI-YIN LOH BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

    The views and conclusions contained in the software and documentation are those of the author and should not be interpreted as representing official policies, either expressed or implied, of the University of Wisconsin.

    Last modified: August 15, 2024
    Since Februrary 20, 2024, this webpage has been viewed times