Changes in version 41.2
1. Deleted a write statement that was left over from earlier versions of lsmod.
2. Made some cosmetic improvements.
3. Changed default rfnumvar in Guide random forest to be same as mtry in Breiman's random forest.
5. Changed default min_dat in random forest to max(1,n/100) and max(5,n/100) for classification and regression.
6. Reduced the number of split points for random forest from 100 to 10 at root node.
7. Reduced maximum number of split levels to 20.
Changes in version 41.1
1. Changed stepwise procedure to fit model with one fewer variable if
fitting error is encountered. If this gives an error (e.g., due to
multicollinearity), a constant is fitted to the node.
2. Made saving regressor names the default in stepwise regression.
Changes in version 41.0
1. Corrected a bug to do with split search on periodic variables.
2. Made results invariant of origin (e.g., 0 degrees) of periodic variables.
3. Removed options for dealing with missing values.
4. Prevented linear splits on P variables.
5. Made small changes to nearest-neighbor and kernel discriminant classification trees.
Changes in version 40.5
1. Allowed "e", "s" and "m" variables in data conversion option.
2. Refined split set selection for piecewise-constant quantile regression.
3. Corrected computation of residuals from quant_reg when there are weights.
Changes in version 40.4
1. Corrected the thresholds for 80% and 90% importance scores.
2. Added USE IEEE_ARITHMETIC and CALL
IEEE_SET_FLAG(IEEE_OVERFLOW,.FALSE.) to suppress overflow warnings.
3. Fixed a bug to do with constant variables when M variables are present.
4. Suppressed IEEE_OVERFLOW and IEEE_DENORMAL warnings in gfortran.
5. Skipped check for sum of left and right node sample sizes
Changes in version 40.2
1. Changed missing value code assignment progress printing from every
10,000 records to every 5,000 records
2. Increased latex paper size to A2 and A1 for very large trees
3. Corrected a bug in fit_model in select.f90
4. Corrected a bug in output of predicted values for classification forests when there are missing values in dependent variable
5. Added 80% and 90% thresholds to importance scores.
Changes in version 40.1
1. Fixed some bugs that necessitated input files to be recreated.
2. Corrected a missing line in R code for classification models.
Changes in version 40.0
1. For fitting piecewise linear proportional hazards and restricted mean models with censored response data, node mean imputation is the only option. This requires the structure of their input files to be changed.
2. For best simple polynomial models, writing the regressor names to a file is now the default.
Changes in version 39.0
1. Added relative hazards in terminal nodes of LaTeX tree for proportional hazards trees with constant fits.
2. Allowed logistic regression with B variables and no N or F variables.
3. Made E variable optional in logistic regression.
4. For piecewise simple polynomial linear, quantile and Poisson regression, made fitting a constant to cases with missing regressor values the default.
5. Corrected some bugs in poisson.f90 and prune.f90 for poisson and prop. hazards.
6. Added missing-value indicator variables (aka "mask" variables) to missing-value imputation option in multiple and stepwise methods.
7. Recommend stepwise or multiple linear models for prediction accuracy and constant models for interpretation.
Changes in version 38.1
1. For mean CV pruning, if pruned tree is trivial, choose smallest nontrivial tree with no larger CV error estimate.
2. If data have an uncensored response variable and a R variable and a Gi or Gs model is fitted without prognostic effects, the latex tree diagram shows the regression coefficients in each terminal node.
Changes in version 38.0
1. Changed relative risk estimates to median survival times for censored response data.
2. Restored a sentence in the LaTeX figure caption on the meanings of node colors in regression trees.
Changes in version 37.3
1. Corrected a bug in latex output for simple linear proportional hazards regression when there is no R variable.
Changes in version 37.2
1. If priors are given through a file, the tree will print posterior probabilities in terminal nodes by default. Otherwise, sample proportions are printed.
2. Corrected a bug in fitted values from kernel and nearest-neighbor classification models.
Changes in version 37.1
1. Corrected output of minimum values of dependent variables in longitudinal models.
2. Removed variable nclass in drawtree since it is same as n_class.
Changes in version 37.0
1. Changed default pruning SE from 0.50 to 0.25.
2. Changed default mindat for constant models to 2.
3. For importance scoring, changed number of permutations from 300 for n <= 1000 observations to: 100 for n=5000, 75 for n=10,000, 60 for n=25,000 observations, etc., at hyperbolic rate.
4. Fixed a bug in latex code when treatment variable has more than 2 levels.
5. If pruned tree is trivial, the next smaller tree among the *, ++ and + trees is selected.
Changes in version 36.2.
1. If an N or S variable is constant or has all values missing and it has an associated M variable, changed the warning about the former to include a statement of whether the M variable is constant.
2. Stopped giving importance scores to variables that are constant or have all missing values.
3. If there are missing values in a split variable, require the number of observations in left and right subnodes must have at least min_dat non-missing values.
Changes in version 36.1.
1. For classification trees with two classes, the default LaTeX tree now shows the sample proportion of class 2 beside each terminal node.
2. Added warning if number of rows in data file does not match number of observations.
3. For R variable, changed minimum number of treatment level to minimum fraction of treatment level, with default 0.20 * minimum treatment fraction.
4. For least squares with 2-level R variable, added mean of of response and proportion of treatment cases below each terminal node of LaTeX tree.
5. Changed the prompts for LaTeX output to allow no node numbers.
6. Made cosmetic changes to text output of regression coefficients, t-stat, etc.
Changes in version 36.0.
1. Fixed a bug in importance scoring that missed interaction effects since version 34.0.
2. Increased number of permutation iterations to 300 for importance scoring.
3. Increased default number of split levels to 4 for importance scoring.
4. Added option to output LaTeX code for importance scoring.
5. Added distinction between "highly important" (99% confidence) and "likely important" (95% confidence) to importance scoring.
6. Changed Bonferroni alpha from 0.50 to 0.10.
7. Changed default colors to make them more color-blind friendly in
LaTeX diagrams.
8. Corrected a bug in constant quantile regression.
Changes in version 35.2.
1. Added a trap to abort if bandwidth routine applied to fewer than 2 observations.
2. Changed bandwidth of each class with less than 2 observations to average bandwidth of the other classes.
3. Corrected an error in printout of the maximum censored and uncensored times.
4. Made cosmetic changes to captions in LaTeX diagrams.
Changes in version 35.1.
1. Improved forest option so that specified fraction of variables is chosen to compete for splits at each node.
2. Made linear splits default in classification ensembles.
3. Doubled number of trees in ensembles if number of variables < 100 and sample size < 500.
4. Increased maximum number of split levels for ensembles to between 15 and 30.
5. Added out-of-bag (OOB) estimates of error to ensemble methods.
Changes in version 35.0.
1. Added option of restricted mean event time (RMET) for censored response variable, necessitating change in input files.
2. Changed splitting at root node if interaction tests yield no splits, revert to best univariate split.
3. Corrected an error that concluded no important variables when all are important.
Changes in version 34.2.
1. Modified threshold computation for importance scores.
2. Changed definition of residual classes in Poisson Anscombe residuals to above or below the mean of all residuals have same sign.
Changes in version 34.1.
1. Corrected a bug to do with interaction split variable selection.
2. Removed listing of interaction variables in LaTeX figure caption.
Changes in version 34.0.
1. Corrected a bug in Poisson and PH regression w.r.t. accumulating chisquared scores.
2. Reverted the weights for importance scoring from node sample size to its square root.
3. Replaced Sattethwaite approximation by a permutation distribution for importance threshold.
4. Added a question on randomization for importance scoring if there is an R variable.
5. Allowed M variables associated with C variables.
6. Corrected a bug to do with best polynomial fits for Poisson and PH models.
7. Changed default minimum node size to approx 1% of training sample size.
8. Corrected a bug to do with leftcat in choose_ocvar.
9. Changed default for classification to sample proportions in nodes of LaTeX trees.
10. Restrict importance and propensity scoring to equal priors.
11. Restrict propensity scoring to equal misclassification costs.
Changes in version 33.3.
1. Corrected a bug when reading data with missing weight values.
2. Added a list of split variables at root node in decreasing order of significance
Changes in version 33.2.
1. Corrected a bug that occurs if an R variable is in the 1st column.
Changes in version 33.1.
1. Corrected a bug with printout of split categories if split is from interaction test.
2. Colored intermediate nodes wheat for interaction splits, lightgray for linear splits.
3. Automatically changed M variables to C if not preceded by N, P or S.
4. Corrected a bug in quantile linear regression that did not use imputation by default.
5. Corrected a bug in find_splitpt0.
6. Added a limit to number of iterations in quant_reg.
7. Corrected some bugs to do with M variables for non-P variables in updating termnodeid.
8. Made grammatical and cosmetic improvements to captions of latex diagrams.
Changes in version 33.0.
1. Changed default Bonferroni correction to trigger interaction and linear splits to alpha=0.10.
2. Added option to produce largest subtree tree with prespecified maximum number of terminal nodes.
3. Corrected a bug with misslab in exhaustcrim routine.
Changes in version 32.3.
1. Improved unbiasedness of importance scores.
2. Change paper type from letter to a3 for large trees.
Changes in version 32.2.
1. Corrected a bug that avoided splits on categorical variables with more than 11 levels.
2. Corrected a bug that prevented top-ranked variables from being used to split the root node, if that option is chosen.
Changes in version 32.1.
1. Deleted an output line to fort.21 in pour_learn_kernel routine.
2. Corrected a bug in categorical variable selection when there are more than 2 classes and categorical variables with more than 11 levels.
3. Corrected an initialization bug in ldacont routine.
4. Changed fitted values outputs for classification to posterior probabilities and ensured that none is zero by mixing with class priors if needed.
5. Changed R code for classification to output estimated class posterior probabilities as well.
6. Reduced the number of iterations for importance scoring to 50 if sample size > 1000.
7. Added the word "weighted" to LaTeX caption if weights are used in least-squares regression.
Changes in version 32.0.
1. Added option to construct tree using the 2nd best split variable at the root node.
Changes in version 31.1.
1. Debiased importance scores.
2. Downweighted chi-squared scores by halving according to split level.
3. Corrected a confition on m1 /= mm1 in split selection that caused early stopping.
4. Added an asterisk to name of regressor in LaTeX code for best simple node models.
5. Colored intermediate nodes gray if split is linear or interaction.
6. Made some corrections to logistic module.
7. Changed chi-squared scores from interaction tests so that interacting variables have more similar final scores.
Changes in version 31.0.
1. Added option for logistic regression.
2. Corrected a bug in splits on categorical variable with less than 12 levels.
Changes in version 30.2.
1. Changed default min. no. cases per treatment at each node to min(20, max(2, v)), where v = 1/5 of smallest number of cases per treatment at root node.
Changes in version 30.1.
1. Added some checks for excessive number of linear predictor variables.
2. Corrected a bug in bootstrap calibration.
3. For multiple linear poisson regression, fit constant model to node if there is a convergence problem.
4. Extended output format of fitted values to provide 3 digits in
exponent.
5. Corrected a bug that caused a segmentation fault in classification when option of not printing fitted values is chosen.
Changes in version 30.0.
1. Removed one prompt from multiple or longitudinal response option, requiring jump to version 30.
2. Splits on categorical variables with more than 11 levels changed to use splits on all LDA variables, with best split being the one minimizing total deviance.
3. Reduced the use of Wilson-Hilferty approximation.
4. Changed constant term to 0 at root node for proportional hazards
models.
5. Changed LaTeX trees to show treatment effects or hazard ratios when there is a treatment variable.
Changes in version 29.7.
1. If chosen-SE tree has no splits, the 0-SE tree is output.
2. Added number of training observations in caption of LaTeX diagram.
3. Added sorting of variable columns by name before CV subsetting.
4. Changed number of missing value column in summary output table to refer only to the observations used for training.
Changes in version 29.6.
1. Corrected a bug to do with piecewise simple polynomial least squares models.
2. Added output of proportions of observations in each arm when there is a treatment (R) variable.
Changes in version 29.5.
1. Ensured that the default split point selection for importance scoring is exahsutive search.
2. Improved variable selection procedure to look at splitting on next most significant variable if the current one does not yield admissible subnodes.
3. Changed default minimum node sample size to max(2,n/100).
Changes in version 29.4.
1. Corrected a bug caused by M variables in non-least squares models.
Changes in version 29.3.
1. Corrected a bug to do with missing values in linear splits for classification.
2. Fixed a problem to do with splits on categorical variables with more than 11 levels.
Changes in version 29.2.
1. Corrected some bugs to do with M variables in proportional hazards models.
2. Corrected some errors and typos in text and LaTeX output.
Changes in version 29.1.
1. Corrected a bug in splits on M variables.
2. Corrected some bugs in R and LaTeX output to do with M variables.
3. Corrected some bugs to do with P variables.
Changes in version 29.0.
1. Changed default SE pruning to 0-SE for stepwise regression.
2. Improved split point selection for polynomial models by using local mean imputation.
3. Introduced indicaor ("I") and periodic ("P") variables.
4. Corrected a bug in LaTeX output.
Changes in version 28.1.
1. Introduced missing-value flag ("M") variables.
2. Made many cosmetic changes to text, LaTeX and R outputs.
3. Changed random seed for cross-validation pruning to yield same results whether or not observations with zero weight are included or excluded.
4. For quantile regression, changed chi-squared test so that zero-residuals join the smaller of positive or negative residual categories.
5. Changed the default proportion of split points for importance scoring to 100 percent (exhaustive search).
6. Changed default to exhaustive search if number of observations with positive weight is not greater than 1 million.
Changes in version 28.0.
1. Removed the option for vertical vs sideways tree diagrams. As a result, all previous input files need regeneration.
2. Made exhaustive search the default if number of observations with positive weights is < 1 million.
3. Corrected display of "Node MSE" in output.
4. Changed colors in LaTeX diagrams to be more color-blind friendly.
5. Added a sign (positive or negative) in front of each fit variable to indicate its slope in LaTeX diagrams.
6. Improved spacing of labels on sides of nodes in LaTeX output.
Changes in version 27.9.
1. Corrected a bug that occurred when there is an R variable and only one C variable.
Changes in version 27.8.
1. Corrected a bug in Gi stepwise and multiple regression when there are B variables.
Changes in version 27.7.
1. Corrected a recently introduced bug in Gi when a node has only one spliitable N variable and no splittable S and C variables.
2. Added the number of terminal nodes in 0-SE tree to output.
Changes in version 27.6.
1. Corrected a mistake in multiple and stepwise linear options in Gi method.
Changes in version 27.5.
1. Changed "Classprior" to "Posterior" in output column label for classification trees.
2. Added minimum node sample size, minimum treatment sample sample (if applicable), and maximum number of split levels to LaTeX figure caption.
3. Corrected a bug to do with simple linear prognostic control with least squares and Gi method.
Changes in version 27.4
1. Reverted to original treatment of categorical variables for Gi method (no merging of categories).
2. Added columns of regression coefficients for treatment indicators in fitted value file when R variable is present.
3. Added columns of class proportions in fitted value file for classification.
4. Made 0-SE the default for Gi method with multiple linear option.
Changes in version 27.3
1. Added code to let program exit gracefully with error message if priors or misclassification costs files are incorrect.
2. Corrected a bug to do with writing fitted values for multiresponse variables.
Changes in version 27.2
1. Corrected some errors in R code when there is a treatment variable.
Changes in version 27.1
1. Added columns of class sample sizes to file of predicted values for classification with simple node models.
2. Improved trimming of terminal nodes to ensure no siblings have same predictions in classification.
3. Improved display of node infor in LaTeX classification trees.
Changes in version 27.0
1. Added class sizes to root node of LaTeX diagrams for classification.
2. Corrected some bugs to do with test-sample pruning.
Changes in version 26.9
1. Removed a trap in SELECT that caused program to abhort when all total costs are infinite in EXHAUSTQ.
2. Corrected some formatting errors in output.
3. For Gi method, categorical predictor levels merged to 4 levels.
4. Corrected R output for multiresponse option.
Changes in version 26.8
1. Corrected some bugs in R output files.
2. Corrected a bug to do with minimum node size.
Changes in version 26.7
1. Increased range of minimum node sizes for non-default option.
2. Made default SE=0 for propensity scoring.
3. Changed minimum number of each treatment to 1 in each node for propensity scoring.
4. Added output line "Run GUIDE with the command: guide < ..." after data file creation.
5. Removed unused variables from appearing in R code function.
6. Always ask to write R code for prediction.
Changes in version 26.6
1. Corrected a bug in LaTeX caption when there is a treatment and an uncensored response variable.
Changes in version 26.5
1. Removed t-statistics and p-values from constant terms in proportional hazards models.
2. Added name of censored survival time in LaTeX tree disgrams.
3. Corrected a bug that surfaced when a treatment variable has more than 2 values.
Changes in version 26.4
1. Set scale factor to 1.1 for chisquare scores of categorical variables with 2-3 levels.
Changes in version 26.3
1. Corrected a bug that affected importance scoring when there are no categorical variables.
Changes in version 26.2
1. Corrected a bug that concerns best polynomial regression with censored response (linear prognostic control).
2. Corrected a bug that gave the wrong 2nd best split variable.
3. Added level names of R variable in output.
Changes in version 26.1
1. Changed cell boundaries for interaction tests to 0.33 and 0.67 quantiles.
Changes in version 26.0
1. Corrected an error to do with linear prognostic control in subgroup identification (R variable).
2. If there are N and F variables and an R variable, multiple linear regression is disallowed.
3. Increased default number of split points searched in N and S variables for importance scoring.
Changes in version 25.4
1. Corrected a bug that affected situations with only 1 S variable and no C and N variables.
Changes in version 25.3
1. Corrected a bug with split point selection for N variables in SELECT.
Changes in version 25.2
1. Corrected a bug to do with candidate split point calculation in SELECT.
2. Added version number in captions of latex tree diagrams.
Changes in version 25.1
1. Corrected a bug to do with chisquare p-values that are too small when prop. hazards model is used.
2. Made improvements in bootstrap selection bias reduction.
3 Reduced default minimum node sample size to 2 (from 3).
Changes in version 25.0
1. Corrected a bug to do with minimum treatment sample sizes when R variable is present.
2. Made cosmetic changes to LaTeX diagrams.
Changes in version 24.9
1. Corrected a bug that occurred when there are R and N variables and a split did not have all the R levels.
Changes in version 24.8
1. Updated LAPACK to 3.7.0 (except for Windows Intel version, which still uses 3.6.1).
2. Made a change to how GUIDE deals with the non-default option of fitting separate regression node models when there are missing values. Previously, it would terminate with a suggestion to use another missing data option. Now it would automatically switch to using the default option.
3. Corrected a bug in LaTeX tree diagram when there is an R variable, the linear prognostic control option is chosen, and variable names have more than 10 characters.
4. Added a change that removes node numbers in LaTeX codes of trees with more than 20 terminal nodes.
5. Added a change that reduces font size of LaTeX trees according to their size.
Changes in version 24.7
1. For regression trees, kept an entry in the input file for storing regressor names, in case GUIDE defaults to piecewise-constant fitting with a non-default missing value option.
2. Fixed a bug in split point selection that causes the Mac NAG version to seg fault.
3. Corrected a bug introduced in 24.6 to do with split set selection.
4. Ensured that split points for ordinal variables are midpoints between data values.
Changes in version 24.6
1. Corrected a bug with split point selection.
Changes in version 24.5
1. Restored linear splits in classification trees (with and without missing values) but not forests.
2. Cosmetic changes to latex output: categorical values thatare too long are abbreviated and extra space beside node numbers is removed.
3. Arranged categorical splits so that fewer categorical values go to left node.
Changes in version 24.4
1. Corrected a bug in importance scoring.
Changes in version 24.3
1. Corrected a bug in GUIDE forest classification when all predictor variables are categorical.
2. Turned off LDA in GUIDE forest classification.
3. Disallow linear splits in classification if any S variable has missing values.
Changes in version 24.2
1. Added a space following node number in LaTeX files.
2. Improved formatting of categorical variable split values.
3. Corrected a bug to do with subgroup identification with R variable.
4. Ensured that all split points are midpoints between successive ordered data values.
Changes in version 24.1
1. Increased length of data values to 200 characters.
2. Corrected a bug concerning node colors in latex diagrams when R variable is present.
3. Removed sideways latex tree option.
Changes in version 24.0
1. Added option to change default minimum no. cases per treatment (requires changes to input files).
2. Reduced number of options at start of program.
3. Corrected an error in comment line of R code.
4. Removed pruning sequence in the output.
Changes in version 23.6
1. Corrected an error in coloring of nodes for subgroup identification.
Changes in version 23.5
1. Corrected an obscure bug .
2. Made cosmetic changes to improve manual.
Changes in version 23.4
1. Cosmetic changes due to revision of manual.
Changes in version 23.3
1. Removed the option of "0" for type of D variable in data conversion.
2. Corrected a bug that occurs if no latex output is requested.
3. Added a hint for model choice if subgroup identification is desired.
Changes in version 23.2
1. Corrected a bug in Intel version with regard to reading and writing doule precision infinity.
2. Upgraded to Lapack 3.6.1.
Changes in version 23.1
1. Corrected a bug to do with non-latex option.
Changes in version 23.0
1. Corrected a bug in variable selection for splitting.
2. Added a catch to prevent splits that result in nodes without prognostic linear predictor when there is a treatment variable and polynomial model is fitted.
Changes in version 22.3
1. Modified structure of printed output in regressor name file for best polynomial model with treatment variable.
Changes in version 22.2
1. Corrected a bug to do with not splitting a node for least squares with N variables.
Changes in version 22.1
1. Corrected a bug that prevented subgroup identification with multiresponse data.
2. Updated Lapack to 3.6.1.
Changes in version 22.0
1. Fixed a bug in split variable selection when there is only one splittable N.
2. Added option for linear prognostic control in randomized experiments.
3. Made font improvements to LaTeX output.
4. Made several small bug fixes.
Changes in version 21.6:
1. Changed to using quartiles to discretize ordinal variables for chi-squared tests in split variable selection (previously, means and SDs were used).
Changes in version 21.5:
1. Improved unbiasedness of importance scores when there is a mix of ordinal and categorical variables.
2. Corrected an error in converting data files to other formats when there are header lines.
3. Corrected a bug in interaction splits.
Changes in version 21.4:
1. Corrected some errors in importance scores for highly non-uniform data.
2. Changed ranks of importance scores to midranks in case of ties.
Changes in version 21.3:
1. Modified Wilson-Hilferty approximation to increase accuracy.
Changes in version 21.2:
1. Corrected a bug that affected multiresponse option when one or more D variables is completely missing in a node.
2. Change sample size text in latex files to italics.
Changes in version 21.1:
1. Stopped NAG compiler from printing underflow warnings.
2. Updated Lapack to 3.6.0.
3. Corrected a bug in output of min and max values for variables with all values missing.
4. Changed a default option for multiresponse models.
Changes in version 21.0:
1. Allowed data files to contain header lines; requires new input and description files.
2. Corrected a bug in importance scores when there are missing values.
3. Abbreviated variable names longer than 10 characters in latex output.
4. Corrected error in predicted values of training samples in ensemble methods.
Changes in version 20.5:
1. Added line for no pruning in output file if this option is chosen.
2. Corrected a bug to do with piecewise polynomial option when variable names are too long.
3. Increased output lengths of variable names from 20 to 60 characters.
Changes in version 20.4:
1. Fixed a bug in least squares fit when dependent variable values are constant.
2. Added output on pruning alpha sequence.
3. Added compiler info in output.
Changes in version 20.3:
1. Added a cosmetic change to batch input log.
2. Changed default option for missing values to mean imputation for
least-squares, quantile, Poisson and prop. hazards models.
3. Added some output statements to show progress.
4. Fixed an old bug in trim_nodes (only for classification with plurality rule).
Changes in version 20.2:
1. Corrected a bug to do with splitting on categorical variables with
many levels.
2. Corrected a bug to do with Poisson regression when there are
numerous zero response values.
Changes in version 20.1:
1. Corrected an I/O bug when a weight variable is present.
Changes in version 20.0:
1. Upgraded from Lapack 3.4.2 to 3.5.0.
2. Corrected a bug with input file for multiple responses when none is
missing.
3. Added ability to fit piecewise multiple linear proportional hazards
models with treatment variable.
4. Added ability to fit piecewise simple ANCOVA models with treatment
variable.
5. Added option to use mean imputation in quantile, Poisson, and
proportional hazards models.
6. Added option for propensity score grouping and causal modeling.
Changes in version 19.0:
1. Corrected a bug that potentially affected all applications with more than one categorical predictor variable.
2. Corrected a bug that affected applications with multiresponse data that do not use exhaustive search for split points.
2. Modified the procedure for DIF identification when p-values are 0.
Changes in version 18.7:
1. Corrected an error in linear interpolation of baseline cumulative hazard function.
Changes in version 18.6:
1. Corrected a bug to do with best simple polynomial model when there are missing values.
2. Corrected a bug to do with missing values in created products and powers.
3. Allowed tabular output to adapt to length of variable and class names.
4. Added values of log baseline cumulative hazard and median survival time to optional output for proportional hazards models.
Changes in version 18.5:
1. Corrected an error in a prompt for importance scoring.
2. Enabled variable names to use non-alphanumeric characters as long as 1st character is alphabetical. The characters #, %, {, }, and space (blank) are automatically replaced by dots.
3. Enabled any character to appear in a data value as long as it is enclosed in quotes.
Changes in version 18.4:
1. Removed a twice repeated prompt for default options.
2. Made outputting a file with fitted values the default.
Changes in version 18.3:
1. Corrected a bug to do with LDA for multi-response with treatment data.
Changes in version 18.2:
1. Corrected a bug that occurs when all observations on a N or F variable are missing in a node.
Changes in version 18.1:
1. Added option to perform LDA in each node for multiresponse data
with a treatment variable.
2. Removed option to choose max proportion of variance for PCA. Now it
is fixed at 0.95.
3. Corrected a bug in piecewise simple linear regression when there
are missing values.
4. Changed a call to lapack gesdd to gesvd.
Changes in version 18.0:
1. Added default options to all models to reduce number of prompts.
2. Corrected a bug in boundary values for split variable selection.
3. Corrected a bug in channeling of missing values when there are none
in training sample and split is due to an interaction.
4. Made numerous asthetic improvements to latex outout.
5. Added capability to perform differential item functioning.
Changes in version 17.11:
1. Corrected a bug in direction of missing values in when splitting is
due to an interaction.
Changes in version 17.10:
1. Corrected a bug affecting classification and non-multiresponse
problems inadvertently introduced in previous revision.
2. Corrected caption in latex output.
Changes in version 17.9:
1. Added option to allow missing dependent variables in multiresponse regression.
Changes in version 17.8:
1. Corrected a mistaken trap that disallowed R variables for survival data
Changes in version 17.7:
1. Corrected a bug in latex output when there are many values in a split set.
Changes in version 17.6:
1. Corrected a bug in the 64-bit linux version when there are more than 2 treatments.
2. Corrected a recently introduced bug in latex output.
Changes in version 17.5:
1. Increased the node size for multiresponse and longitudinal data when their number is small.
2. Changed default option for linear regression to impute with means if there are missing values.
3. Allowed stepwise regression to continue if number of variables exceeds sample size.
4. Changed some defaults for forest: #variables selected = #variables/3, mindat = max(5,n/200).
Changes in version 17.4:
1. Allowed choice of alternative models for least-squares non-constant fits when training data have no missing values.
2. Corrected a bug in LDA for categorical splits with more than 12 levels.
3. Changed routine for computation of F cdf to fcdf to avoid difficulties with large dfs.
Changes in version 17.3:
1. Changed DIF scores to p-values.
Changes in version 17.2:
1. Corrected a bug to do with reading double precision numbers using ltxunit.
2. Allowed program to switch automatically to fitting constant models when number of complete cases are too few.
3. Allowed periods in variable names.
4. Corrected a bug to do with mean/mode imputation of S variables.
5. Revised method of computing DIF scores.
Changes in version 17.1:
1. Added default option to use PCA for variable selection in multiresponse regression.
2. Cleaned up R code output for multiresponse and longitudinal data options (5 & 6).
3. Corrected a bug that affected data set creation (option 3) when there is more than one dependent variable.
4. Removed option to normalize D variables for longitudinal data (option 6).
5. Corrected a bug that got into the previous version.
Changes in version 17.0:
1. Added option for subgroup identification with multiple dependent variables.
2. Corrected some bugs to do with split point and split set selection.
Changes in version 16.4:
1. Corrected some bugs that disabled pruning with test samples for classification.
2. Corrected a bug affecting variable importance scoring.
Changes in version 16.3:
1. Ensured that categorical values not present in a node are not shown in splits in latex and text diagrams.
2. Removed option to overwrite existing files in data conversion.
Changes in version 16.2:
1. Cleaned up text, latex and R codes for tree structures to remove redundancies due to missing values.
2. Allowed automatic switching to piecewise-constant models if number of complete cases is too small.
3. Removed quotes in last instruction for using batch file.
Changes in version 16.1:
1. Added 3 options for missing values in N and F variables in least squares regression.
2. Changed splits on categorical variables: if a node is split on a C variable, the smaller subnode is placed on the left side. This causes all unseen categories to go to the larger (right) subnode.
3. For option 1, where a constant is fitted to obs with missing values, the constant is changed to the mean of the missing obs.
4. Corrected a bug in R code for prediction function.
5. Added a column of observed response values to predicted value output file for ensemble methods and changed the predicted value column heading from "Predicted" to "predicted".
6. Added double quotes around values of categorical variables in outputs.
Changes in version 16.0:
1. Simplified dialog when there is a weight variable. This necessitates input files to be recreated.
Changes in version 15.15:
1. Made cosmetic changes to node numbers in LaTeX output.
2. Corrected some labeling errors in subgroup identification options.
Changes in version 15.14:
1. Corrected a bug to do with importance ranking batch file creation for non least squares problems.
2. Added a default option to switch to piecewise constant trees if there are missing regressor values in linear regression.
Changes in version 15.13:
1. Corrected a bug that concerns least-squares simple polynomial models
2. Reduced the number of available models to Gs and Gi for subgroup identification.
3. Added quotes around character strings for nominal attributes in
ARFF formatted data files. If the total length of the values of an
attribute is more than 200 characters (including commas and quotes),
it is declared as string; otherwise it is declared as nominal.
Changes in version 15.12:
1. Corrected a bug that affected stepwise regression.
Changes in version 15.11:
1. Corrected a bug with missing censored survival times.
Changes in version 15.10:
1. Corrected a bug with smallest uncensored survival time.
2. For importance scoring, changed the default expected number of
noise variables found important to be 0.05 of the total number of
noise variables.
3. Corrected an error in figure caption of latex tree for proportional hazards models.
Changes in version 15.9:
1. Added option to print out regression coefficients to a file for
proportional hazards models.
2. Corrected a bug in computation of smallest uncensored time when weights are present.
3. Corrected a bug in generation of input files for importance scoring.
Changes in version 15.8:
1. Added a check that number of observations does not exceed 2^32.
2. Corrected a bug with no crossing of treatment and factors in PRELS.
Changes in version 15.7:
1. Corrected a bug in sideways LaTeX output.
2. Increased horizontal spacing of nodes in LaTeX trees.
3. Corrected some bugs in differential treatment options.
Changes in version 15.6.1:
1. Corrected a bug in multiresponse option when there is only one numeric variable.
Changes in version 15.6:
1. Corrected a bug in batch mode when non-default number of CV folds is used.
2. Increased default values of min_dat (constant regression and simple
classification) and lev_splits (single trees).
3. Changed minimum value from Wilson-Hilferty output to 0.001
(previously 0) to avoid problems with importance ranking when all
variables are insignificant.
4. Allowed continued splitting for importance scoring if interaction
tests fail (previously, the node is made terminal).
5. Changed the default values for max number of split levels and min
number of observations in each node.
6. Added capability for item response data.
Changes in version 15.5:
1. Corrected a bug in LaTeX output.
2. Reinstated choice of univariate or bivariate kernel and nearest-neighbor fits.
3. Added weights to multiresponse option.
4. Improved linear split algorithm to switch to univariate splits when
crimcoords are too large or too small.
Changes in version 15.4:
1. Added the Li-Martin method to approximate an F quantile with a
chi-square quantile (used only in Gi method).
2. Corrected a bug when df=0 in Gi method.
Changes in version 15.3:
1. Corrected some bugs and changed the format of LaTeX trees.
Changes in version 15.2:
1. Fixed a bug concerning spaces in values of treatment variables.
2. Added a trap to prevent building classification trees if a
treatment (R) variable is present.
Changes in version 15.1:
1. Improved linear split option by searching over all pairs.
2. Added a linear split option to bagged GUIDE.
3. Corrected a bug in pruned tree for least median of squares regression.
4. Added an option to show #misclassfified/sample size in LaTeX trees.
5. Corrected some bugs in display of LaTeX tree diagrams
Changes in version 15.0:
1. Improved splits on missing values.
2. Corrected a bug in Ancova models.
3. Corrected a bug in R code.
4. Added checks to ensure that all treatment values are present in all splits.
5. Added more information in LaTeX tree diagrams.
Changes in version 14.2:
1. Added a restriction to at most 10 groups for multiresponse and longitudinal data.
2. Corrected a bug in R code for multiresponse data.
Changes in version 14.1:
1. Increased length of character strings in output file of predicted values.
2. Corrected a bug in latex output when categorical values contain spaces.
3. Added node sample sizes to latex trees.
4. Corrected a bug in R code when B variables have no missing values.
Changes in version 14.0:
1. Revised options for applications with treatment variables.
2. Revised latex output.
3. Added option to produce R code for prediction of future cases.
Changes in version 13.4:
1. Modified the LaTeX output to include class sizes in each terminal
node of a classification tree and to say whether the model uses
estimated, equal or specified priors and unit or unequal
misclassification costs.
2. Fixed a bug in data reformatting option.
3. Added a new subgroup identification method.
Changes in version 13.3.2:
1. Removed a redundant and erroneous prompt for regression with weights.
Changes in version 13.3:
1. Corrected a bug in stepwise regression option that was introduced in a prior version.
2. Added header to fitted probability file for kernel density option in classification
3. Added "pstree[treemode=D]"
Changes in version 13.2:
1. Corrected a bug that caused an infinite loop in the stepwise regression option.
Changes in version 13.1:
1. Corrected the algorithm for linear splits.
2. Lowered default value of mindat to 2% of sample size.
3. Made some cosmetic changes to the output.
Changes in version 13.0:
1. Added a classification option for bivariate linear splits.
2. Corrected a bug with linear splits on missing value.
3. Corrected a mistake in output description file for importance scores.
Changes in version 12.6:
1. Allowed N and F variables with R variable.
Changes in version 12.5:
1. Changed value of contab chi-squared statistic to 0 (instead of 1) in case or errors.
2. Avoided computation of p-values; used Wilson-Hilferty in all cases.
3. If an R variable is present, made sure that each subnode from a
split has at least two R levels.
4. Contab chi-squared tests are computed for each level of the R variable, if present.
5. For the Gi (option 2) R method, the test statistic is the maximum
of exponential quantiles.
Changes in version 12.4:
1. Corrected a bug that affected survival data.
Changes in version 12.3:
1. Corrected a bug involving option 3 for treatment variable.
Changes in version 12.2:
1. Corrected several bugs that were introduced around version 12.0.
2. Corrected a bug in lsdev in poisson.f90 regarding NFIT for stepwise regression.
Changes in version 12.1:
1. Broke ties in chi-squared values and raised the ceiling for max
value for split variable selection.
2. Corrected an error in output of intermediate node sample means.
3. Added output info about missing values in splits.
Changes in version 12.0:
1. Corrected several bugs in Poisson and proportional hazards models.
2. Disallowed "N", "F" and "B" variables when an "R" variable is present.
3. Changed default in LaTeX trees to not print node numbers.
4. Changed search for splits on categorical variables to exhaustive
search for 9 or fewer categories for all except classification and
quantile regression.
Changes in version 11.7:
1. Skipped bootstrap calibration if there are only B and no S variables.
2. Skipped bootstrap calibration for constant models.
3. Corrected a bug introduced in 11.6.
4. Reverted back to default choice of "1" for "r" variables.
5. Updated lapack and lapack95 libraries to 3.4.1 and 3.0, resp.
Changes in version 11.6:
1. Corrected several bugs that affected Poisson and proportional hazards models.
2. Corrected bug that left out bootstrap calibration.
3. Changed the maximum length of character data entries to 50.
4. Added several traps for floating over and underflows.
Changes in version 11.5:
1. Fixed a bug about file name of predicted probabilities for kernel classification.
2. Fixed a bug in batch file creation for longitudinal data.
Changes in version 11.4:
1. Made non-exhaustive search the default.
2. Corrected some inconsistencies with regard to R variables.
3. Changed the license to BSD.
Changes in version 11.3:
1. Corrected a bug that affected missing D values in multiresponse data.
2. Changed LaTeX output to indicate where missing values go.
3. Changed LaTeX output for multiresponse and longitudinal data.
4. Added more options for treatment (R) variables.
Changes in version 11.2:
1. Corrected a bug involving names of variables that are too long in
output for polynomial models.
Changes in version 11.1:
1. Re-introduced choice of mean or median based CV pruning.
2. Added ability to deal with unbalanced observation times in longitudinal models.
3. Corrected some bugs in proportional hazards models when there are
cases with censored survival times less than smallest uncensored
survival time.
4. Made median (instead of mean) CV estimate the default for pruning
for proportional hazards models.
5. Numerous cosmetic changes.
Changes in version 10.6:
1. Corrected a bug that mixed itpos with izpos in proportional hazards models.
2. Required data for fitting proportional hazards models to have some
censored and some uncensored data.
3. Automatically discounted cases with censored survival times less
than smallest uncensored time.
Changes in version 10.5:
1. Stopped R variables from being used to split the nodes.
2. Made a node terminal during LDA splits if the LDA routine returns an error.
Changes in version 10.4:
1. Changed the default grouping for multiple dependent variables to be
ungrouped; up to 20 groups are allowed.
2. Corrected a mistake in the LaTeX figure caption.
3. Added two-sided p-values beside t-statistics in output.
Changes in version 10.3:
1. Allowed use of N variables in importance score option.
Changes in version 10.2:
1. Changed the output of importance scores.
2. Corrected a consistency problem in importance scores for quantile regression.
3. Added printing of D variable name(s) in output file.
Changes in version 10.1:
1. Corrected a bug in computation of linear discriminant splits.
Changes in version 10.0:
1. Corrected a bug in computation of chi-squared test statistic.
2. Corrected a bug in computation of linear splits.
3. Added "R" variable type to indicate treatment variables.
4. Added option for multi-response and longitudinal data.
5. Added option to draw LaTeX trees sidways.
Changes in version 9.4:
1. Corrected another bug concerning splits on missing predictor values.
Changes in version 9.3:
1. Corrected a bug concerning splits on missing predictor values.
2. Allowed importance scoring for piecewise-linear regression models.
Changes in version 9.2:
1. Corrected a bug that occurs with missing values in the D variable for classification.
Changes in version 9.1:
1. Corrected a bug that occurs with missing values in the D variable for regression.
Changes in version 9.0:
1. Changed the way missing values for ordered variables are
handled. Now if there are missing values in a split variable, a split
on missingness is one of the splits considered. If a split is on a
non-missing value, observations with missing values are channeled
through the split by replacing them with the mean of the non-missing
values in the split variable.
2. Added an option for multi-response dependent variables.
3. Increased maximum length of character strings to 80.
Changes in version 8.4:
1. Corrected a bug (introduced in ver.8.2) in calculation of
chi-square probabilities. The Wilson-Hilferty approximation is used
for very small p-values.
2. Fixed a bug in interaction splits on one categorical and one ordered variable.
Changes in version 8.3:
1. Fixed some bugs with batch file creation and execution.
Changes in version 8.2:
1. Corrected a bug that affected batch operation for importance scoring.
2. Changed to direct chi-square conversion instead of Wilson-Hilferty
approximation if df < 10.
3. Corrected bug in printing of importance scores.
Changes in version 8.1:
1. Corrected a bug triggered by missing values in the dependent variable.
Changes in version 8.0:
1. Added option for bagging and random forest ensembles.
2. Changed method of dealing with missing values. Missing values in
ordered variables used for splitting are treated as negative
infinity and observations are predicted with node sample mean if
there are missing values in regressors.
3. Added an option to obtain importance scores in two applications of GUIDE.
4. Added linear splits for classification trees.
5. Better control of interaction and linear split searches.
6. Improved split point and value set selection in interaction splits.
7. Fixed many small bugs.
Changes in version 7.9:
1. Added kernel and nearest-neighbor models for classification trees
Changes in version 7.8:
1. Corrected more bugs in polynomial regression option
Changes in version 7.7:
1. Corrected a bug in polynomial regression option
Changes in version 7.6:
1. Added the option to output a separate file containing scaled
importance scores and variable names.
2. Implemented an improved variable split selection method for data
with missing values.
Changes in version 7.5:
1. Corrected a bug in option #3 when there is no D variable in description file
2. Corrected a bug that affects formatting of ARFF data (option 3).
Changes in version 7.4:
1. Corrected a bug in linear splits when some observations have
missing values in the dependent variable
2. Added the option to draw LaTeX classification trees without node colors
Changes in version 7.3:
1. Require at least one B variable for stepwise simple ANCOVA
Changes in version 7.2:
1. Added checks to prevent over-writing of files
2. Changed default SE-rule to 0 for proportional hazards models
3. Fixed a (cosmetic) bug in the output of split values for
categorical variables with many values
4. Changed node information in LaTeX figures for classification trees
5. Eliminated printout of variable roles in importance ranking output
6. Increased the number of colored leaf nodes to 18 for classification
tree LateX diagrams
Changes in version 7.1:
1. Fixed a bug introduced in 7.0 that affected stepwise and polynomial models
2. Fixed a bug involving importance score cut-off
Changes in version 7.0:
1. Added kernel and nearest-neigbor models for classification trees
2. Added splits on linear combinations of two variables for classification
3. Improved the way variables are selected for splits and the way
split points are selected
4. Added coloring and other cosmetic features to LaTeX output
5. Improved the importance ranking method
Changes in version 6.2:
1. Changed algorithm for splits on categorical variables in classification
2. Added new data formats: C4.5 and ARFF
Changes in version 6.1:
1. Fixed a bug that wrote messages to fort file during batch file creation
2. Fixed a bug in calculation of estimated class priors when there is
a weight variable
Changes in version 6.0.1:
1. Fixed a bug that miscalculated test sample misclassification cost
when some classes are absent in the test sample
Changes in version 6.0:
1. Added classification tree capability
2. Added random forest capability
3. Made some changes to split selection algorithms
Changes in version 5.3:
1. Corrected a bug that affected latex files when there are no
categorical variables
Changes in version 5.2:
1. Changed variable selection for interaction tests to use two-levels
of splits
2. Reverted to true stepwise and ancova fitting for split selection
for these options
Changes in version 5.1:
1. Fixed a bug caused by missing values while reading data
Changes in version 5.0:
1. Improved approach to interaction tests to account for their number
2. Changed default SE for constant fit to 0
3. Added option for variable importance scores and for identification
of unimportant variables
Changes in version 4.4:
1. Fixed a bug in split variable selection routine that affected "s"
variables
Changes in version 4.3:
1. Made the program output progress after each CV iteration.
2. Extended the length of variable names from 8 to 10 for data
conversion to SAS.
3. Added code for PROC GLM and PROC REG if SAS output is selected.
4. Added a suggestion to use white or yellow colors if leaf node
numbers are selected.
5. Allow execution to continue if an excluded variable contains values
longer than 20 characters.
6. Allow option 3 (data conversion) to proceed if there are data
values longer than 20 characters, with a warning that they are
truncated.
Changes in version 4.2:
1. Corrected a bug that affects relative risk regression when some D
or T variables have missing values.
2. Added an option for weighted or unweighted error estimation when a
weight variable exists. The default is unweighted.
3. Changed from zero-truncated normal to 1-df chi-square statistic for
split variable selection.
Changes in version 4.1:
1. Corrected a bug in output for truncation type 2.
2. Reverted default value of mindat to 0 for stepwise option.
Changes in version 4.0:
1. Added an option for least median of squares (robust) regression for
multiple and best simple linear fitting.
2. Increased amount of information output to (optional) file containing
names and regression coefficients in leaf nodes.
3. Changed absolute z to truncated z for variable selection and
bootstrap bias correction.
4. Added an option to save multiple regression coefs in a separate file.
5. Added an option to fit piecewise least-squares multiple linear
regression without intercept terms.
6. Added an option to not truncate, truncate fitted values, or
truncate x-values before prediction.
7. Added option to drop insignificant leading powers in polynomial models.
8. Added option for stepwise simple linear ANCOVA.
9. Allowed stepwise linear option to use "c" or "b" categorical variables.
10. Fixed a bug that affected datasets with missing values but without
weight variables.
11. Fixed a bug in split point selection to use total mean deviance
instead of total deviance.
12. Added option for all subsets regression.
13. Improved search over split points for non-exhaustive search.
14. Changed option for non-exhaustive search from fraction to number.
Changes in version 3.1:
1. For stepwise and polynomial regression, added option to write the
leaf node number and the selected regressors into a separate file.
2. Added option for colored leaf nodes and improved font sizes in
LaTeX tree diagrams.
3. Added optional file of node IDs and fitted values of a column to
indicate training observation.
4. Added R-squared value for tree model (least-squares fit only).
5. Fixed bug in interaction test. Now preference for "c" variable is
given on to multiple linear regression.
6. Increased number of trees in pruning sequence.