October 27, 2005
Google News Report USA Score
Fetch headlines from Google News on a schedule, then rank
headlines by factors:
* appearance day and time,
* prominence on the google news page,
* number of appearances,
* others;
weighted to estimate referer traffic these links bring to their
source.
Listed are the top scoring stories in recent time periods, followed
by a ranking of sources. More detailed reports are linked-to at the
bottom of each table.
[*]
Posted by omor at 02:38 PM | Comments (0)
October 15, 2005
Joint regression analysis
Joint regression analysis to study genotype-environmental interaction,
genotype effects and/or interaction effects within individual
environments are related to environmental effects.
The interaction sum of squares is divided into two parts:
* one part represents the heterogeneity of linear regression
coefficients while
* the second represents the pooled deviations from individual
regression lines.
Posted by omor at 10:31 AM | Comments (0)
September 08, 2005
Hospital Length of Stay: Mean or Median Regression
Length of stay (LOS) is an important measure of hospital activity and
health care utilization, but its empirical distribution is often
positively skewed.
Median regression appears to be a suitable alternative to analyze
the clustered and positively skewed LOS, without transforming and
trimming the data arbitrarily.
Objective. This study reviews the mean and median regression
approaches for analyzing LOS, which have implications for service
planning, resource allocation, and bed utilization.
Methods. The two approaches are applied to analyze hospital discharge
data on cesarean delivery. Both models adjust for patient and
health-related characteristics, and for the dependency of LOS outcomes
nested within hospitals. The estimation methods are also compared in a
simulation study.
Results. For the empirical application, the mean regression results
are somewhat sensitive to the magnitude of trimming chosen. The
identified factors from median regression, namely number of diagnoses,
number of procedures, and payment classification, are robust to
high-LOS outliers. The simulation experiment shows that median
regression can outperform mean regression even when the response
variable is moderately positively skewed.
Conclusion. Median regression appears to be a suitable alternative to
analyze the clustered and positively skewed LOS, without transforming
and trimming the data arbitrarily.
Analyzing Hospital Length of Stay: Mean or Median Regression ?
Medical Care. 41(5):681-686, May 2003.
Lee, Andy H.; Fung, Wing K.; Fu, Bo
[**]
Posted by omor at 03:07 PM | Comments (0)
August 29, 2005
r graphics (Paul Murrell) is out
R Graphics by Paul Murrell shipped.
Previously announced.
With bonus New Zealnd content.
Posted by omor at 05:02 PM | Comments (0)
August 24, 2005
MCMC method bandwidth selection for multivariate kernel density estimation
Kernel density estimation for multivariate data is an important
technique that has a wide range of applications in econometrics and
finance. The lower level of its use is mainly due to the increased
difficulty in deriving an optimal data-driven bandwidth as the
dimension of data increases. We provide Markov chain Monte Carlo
(MCMC) algorithms for estimating optimal bandwidth matrices for
multivariate kernel density estimation.
Our approach is based on treating the elements of the bandwidth matrix
as parameters whose posterior density can be obtained through the
likelihood cross-validation criterion. Numerical studies for bivariate
data show that the MCMC algorithm generally performs better than the
plug-in algorithm under the Kullback-Leibler information criterion.
Numerical studies for five dimensional data show that our algorithm is
superior to the normal reference rule.
MCMC method bandwidth selection for multivariate kernel density
estimation
Session Nonparametric Estimation II
Field Econometrics
Session Chair Qi Li, Texas A&M University
Presenter(s) Maxwell King, Monash University
Co-Author(s) Xibin Zhang, Department of Econometrics and Business
Statistics, Monash University and Rob Hyndman, Monash University
Topics Semi/Nonparametrics
Keywords Cross-validation, Kullback-Leibler information, Mean
integrated squared errors, Monte Carlo kernel likelihood and Sampling
algorithms
JEL Codes C11, C14, C51
Posted by omor at 05:59 PM | Comments (0)
August 23, 2005
Curve Forecasting by Functional Autoregression
This paper explores prediction in time series in which the data is
generated by a curve-valued autoregression process. It develops a
novel technique, the predictive factor decomposition, for estimation
of the autoregression operator, which is designed to be better suited
for prediction purposes than the principal components method.
The technique is based on finding a reduced-rank approximation to the
autoregression operator that minimizes the norm of the expected
prediction error. The new method is illustrated by an analysis of the
dynamics of Eurodollar futures rates term structure. We restrict the
sample to the period of normal growth and find that in this subsample
the predictive factor technique not only outperforms the principal
components method but also performs on par with the best available
prediction methods.
Curve Forecasting by Functional Autoregression
Presenter(s) Alexei Onatski, Columbia University
Co-Author(s) Vladislav Kargin, Cornerstone Research
Session Chair James Stock, Harvard University
Topics Financial Econometrics, Forecasting, State Space and Factor
models and Time Series
Keywords Dimension reduction, Functional data analysis,
Generalized eigenvalue problem, Interest rates, Predictive factors,
Principal components, Reduced-rank regression and Term structure
JEL Codes C23, C53, E43
Posted by omor at 06:15 PM | Comments (0)
August 20, 2005
Functional data analysis (FDA)
Functional data analysis (FDA) handles longitudinal data and treats
each observation as a function of time (or other variable). The
functions are related. The goal is to analyze a sample of functions
instead of a sample of related points.
FDA differs from traditional data analytic techniques in a number of
ways. Functions can be evaluated at any point in their domain.
Derivatives and integrals, which may provide better information (e.g.
graphical) than the original data, are easily computed and used in
multivariate and other functional analytic methods.
S+Functional Data Analysis User's Guide
by Douglas B. Clarkson, Chris Fraley, Charles C. Gu, James O. Ramsay
Functional Data Analysis (Springer Series in Statistics) (Hardcover)
by J. Ramsay, B. W. Silverman
Covers topics of linear models, principal components, canonical
correlation, and principal differential analysis in function spaces.
Applied Functional Data Analysis (Paperback)
by J.O. Ramsay, B.W. Silverman
Bernard W. Silverman's code site Applied Functional Data Analysis: Methods and Case Studies
See also FunctionalData.org, and Function valued traits.
Posted by omor at 10:50 PM | Comments (0)
August 19, 2005
Mathematical Statistics with MATHEMATICA
Mathematical Statistics with MATHEMATICA,
Colin Rose, Murray D. Smith (Hardcover)
The mathStatica software, an add-on to Mathematica, provides a
toolset specially designed for doing mathematical statistics. It
enables students to solve difficult problems by removing the technical
calculations often associated with mathematical statistics. The
professional statistician will be able to tackle tricky multivariate
distributions, generating functions, inversion theorems, symbolic
maximum likelihood estimation, unbiased estimation, and the checking
and correcting of textbook formulas. This text would be a useful
companion for researchers and students in statistics, econometrics,
engineering, physics, psychometrics, economics, finance, biometrics,
and the social sciences.
Companion site mathStatica.com
Posted by omor at 09:48 PM | Comments (0)
August 04, 2005
Information Visualisation with r
Information Visualisation Lecture Slides uses r.
Sets of lecture slides (PDF), also an 8-up version of the slides suitable for printing.
Posted by omor at 12:09 PM | Comments (0)
July 21, 2005
Asset prices by Enricode Giorgi
Default models and asset pricing models at Enricode Giorgi's resource,
some with correlated defaults.
Posted by omor at 02:12 PM | Comments (0)
July 19, 2005
sas proc quantreg for quantile regression
Some PROC QUANTREG features are:
* Implements the simplex, interior point, and smoothing algorithms for
estimation
* Provides three methods to compute confidence intervals for the
regression quantile parameter: sparsity, rank, and resampling.
* Provides two methods to compute the covariance and correlation
matrices of the estimated parameters: an asymptotic method and a
bootstrap method
* Provides two tests for the regression parameter estimates: the Wald
test and a likelihood ratio test
* Uses robust multivariate location and scale estimates for leverage
point detection
* Multithreaded for parallel computing when multiple processors are
available
Posted by omor at 04:22 PM | Comments (0)
July 17, 2005
SAS examples with explanation at ucla.edu/stat/SAS/
SAS examples with explanation abound at UCLA: 1, 2.
Posted by omor at 10:38 PM | Comments (0)
July 10, 2005
Array manipulation: Perl Data Language (PDL) and piddles
To COMPACTLY store and SPEEDILY manipulate the large
N-dimensional data sets which are the bread and butter
of scientific computing. e.g. $a=$b+$c can add two
2048x2048 images in only a fraction of a second.
Perl Data Language (PDL), PDL::Impatient - PDL for the impatient
A PDL scalar variable (an instance of a particular class of
perl object, i.e. blessed thingie) is a piddle.
Posted by omor at 01:42 PM | Comments (0)
June 17, 2005
state of stats
What have we learnt ? State of stats: PDF, Antony Unwin on Statistical Learning.
Global criteria: – AIC, BIC, deviance, test error,...
Local criteria: – residuals, diagnostics
[more]
Posted by omor at 04:28 PM | Comments (0)
June 16, 2005
Support Vector Machine
An SVM corresponds to a linear method in a very high dimensional feature
space which is nonlinearly related to the input space. It does not
involve any computations in that high dimensional space. By the use of
kernels, all necessary computations are performed directly in input space.
are a method for creating functions from a set of labeled training
data. The function can be a classification function (the output is
binary: is the input in a category) or the function can be a general
regression function.
For classification, SVMs operate by finding a hypersurface in the
space of possible inputs. This hypersurface will attempt to split the
positive examples from the negative examples. The split will be chosen
to have the largest distance from the hypersurface to the nearest of
the positive and negative examples. Intuitively, this makes the
classification correct for testing data that is near, but not
identical to the training data.
r (with module e1071):
estimate, predict, example, example2.
Matlab:
Kernel Methods for Pattern Analysis
John Shawe-Taylor & Nello Cristianini
Cambridge University Press, 2004
Detailed contents, inventory of algorithms and kernels, and matlab code.

Stand-alone:
SVM Light is a Support Vector Machine.
More reading:
NUS with article abstracts.
Support vector extended bibliography and software
Recursive SVM, 2, 3.
Classification by Support Vector Machines (Florian Markowetz, Berlin) See part 3 in PDF
Statistical Modelling in R (Thomas Lumley) -- see last section PDF
Posted by omor at 08:58 PM | Comments (0)
June 15, 2005
Spectral Graph Transducer, SGTlight
SGTlight is an implementation of a Spectral Graph Transducer (SGT)
[Joachims, 2003] in C using Matlab libraries. The SGT is a method for
transductive learning. It solves a normalized-cut (or ratio-cut) problem
with additional constraints for the labeled examples using spectral
methods. The approach is efficient enough to handle datasets with
several ten-thousands of examples.
Posted by omor at 11:55 AM | Comments (0)
June 14, 2005
Analysis of patterns
Automatic pattern analysis of data is a pillar of modern science,
technology and business, with deep roots in statistics, machine
learning, pattern recognition, theoretical computer science, and many
other fields. A unified conceptual understanding of this strategic
field is of utmost importance for researchers as well as for users of
this technology.
This workshop - course will emphasizes the common principles and roots
of modern pattern analysis technology, developed independently by many
different scientific communities over the past 30 years, and their
impact on modern science and technology.
Students and researchers from many disciplienes dealing with automatic
pattern analysis form the intended audience. These include (but are
not limited to) statistics, pattern recognition, data mining, machine
learning, information theory, sequence analysis, bioinformatics,
adaptive systems, etc.
Italy, October 28 - November 6, 2005
Posted by omor at 11:37 AM | Comments (0)
June 13, 2005
Data mining competition
Fair Isaac and UCSD data mining competition lets you test your predictive power.
Posted by omor at 09:50 PM | Comments (0)
May 03, 2005
Kalman filter with Mathematica
Kalman filter (An algorithm in control theory introduced by R. Kalman in 1960 and
refined by Kalman and R. Bucy. It is an algorithm which makes optimal use of imprecise
data on a linear (or nearly linear) system with Gaussian errors to continuously update
the best estimate of the system's current state.)
As a times series function (example); as an estimator for linear
(time series and panel) models with time-varying coefficients.
See also Control System Professional: Kalman Filter, Igor Bakshee, Wolfram Research, Inc.
Posted by omor at 08:54 PM | Comments (0)
April 29, 2005
Decision Science News / Dan Goldstein
Decision Science News by Dan Goldstein and Kevin Flora
about the decision sciences including but not limited to Psychology,
Economics, Business, Medicine, and Law, but
mostly marketing.
Posted by omor at 08:20 PM | Comments (0)
April 28, 2005
Statistical Modeling, Causal Inference / MLM
Statistical Modeling, Causal Inference, and Social Science (MLM)
Andrew Gelman and Samantha Cook at Columbia.
Posted by omor at 01:33 AM | Comments (0)
April 27, 2005
XLISP-Stat estimates Generalised Estimating Equations
XLISP-Stat tools for building Generalised Estimating Equation models
offers an introduction to GEE models.
Much of the brain trust of XLISP Stat has moved on to r.
Generalised Estimating Equations models, proposed by Liang and Zeger
in 1986, are probably the simplest method for analysing data collected
in groups where observations within a group may be correlated but
observations in separate groups are independent. A complete
description of the method is given in their two 1986 papers. The basic
principle of the method is a generalisation of the fact that weighted
least squares analyses give unbiased parameter estimates no matter
what weights are used. Generalised linear models, such as logistic
regression, have similar robustness properties, giving asymptotically
correct parameter estimates even when the data are correlated. This
means that it is possible to estimate regression parameters using any
convenient or plausible assumptions about the true correlation between
observations and get the right answer even when the assumptions are
not correct.
It is only necessary to use a ``model-robust'' or ``agnostic''
estimate of the standard errors. It would be unreasonable to expect
this freedom of choice to be without cost and it turns out that there
is a moderate gain in efficiency resulting from choosing a working
correlation structure close to the true one.
Useful references include the two original papers (Zeger & Liang 1986,
Liang & Zeger 1986) and two recent books: Diggle, Liang & Zeger (1993)
and Fahrmeir & Tutz (1995). As far as I know the most elementary
treatment anywhere in the literature is still Zeger & Liang (1986).
Section 2 gives an overview of the theory and use of Generalised
Estimating Equations. Section 3 describes how to use the Lisp-Stat
code, including diagnostics. Finally there is a brief discussion of
missing data handling and of other software for fitting GEE models.
Appendix A describes some aspects of the implementation, including the
global variables (Table 5) that control many program options.
Posted by omor at 08:02 PM | Comments (0)
April 19, 2005
r graphics, Paul Murrell
Update 2005 Sept 03: R Graphics is shipping !
A book on the core graphics facilities of the R language and
environment for statistical computing and graphics (to be published
by Chapman & Hall/CRC in August 2005). Preview now.
Posted by omor at 01:33 AM | Comments (0)
March 25, 2005
Wavelets
Wavelets are mathematical expansions that transform data from the
time domain into different layers of frequency levels. Compared to
standard Fourier analysis, they have the advantage [PDF] of being
localized both in time and in the frequency domain, and enable the
researcher to observe and analyze data at different scales.
Wavelets for Economists [PDF]
Christoph Schleicher, Bank of Canada / Banque du Canada
Posted by omor at 11:59 PM | Comments (0)
February 01, 2005
Exploratory Data Analysis in NIST's Statistics Handbook
NIST's Engineering Statistics Handbook: Exploratory Data Analysis.
Posted by omor at 10:43 PM | Comments (0)
January 31, 2005
Basel default
Probability of Default (PD)
- the probability that a specific customer will default
within the next 12 months.
Loss Given Default (LGD)
- the percentage of each credit facility that will be lost
if the customer defaults.
Exposure at Default (EAD)
- the expected exposure for each credit facility in the
event of a default.
PD, LGD, and EAD are key measures used by Basel II: Peldec.
Posted by omor at 01:45 PM | Comments (0)
January 29, 2005
How Ratings Agencies Achieve Rating Stability
Surveys on the use of agency credit ratings reveal that some
investors believe that rating agencies are relatively slow in
adjusting their ratings. A well-accepted explanation for this
perception on the timeliness of ratings is the "through-the-cycle"
methodology that agencies use. According to Moody's, through-the-cycle
ratings are stable because they are intended to measure the risk of
default risk over long investment horizons, and because they are
changed only when agencies are confident that observed changes in a
company's risk profile are likely to be permanent. To verify this
explanation, we quantify the impact of the long-term default horizon
and the prudent migration policy on rating stability from the
perspective of an investor - with no desire for rating stability. This
is done by benchmarking agency ratings with a financial ratio-based
(credit scoring) agency-rating prediction model and (credit scoring)
default-prediction models of various time horizons. We also examine
rating migration practices. Final result is a better quantitative
understanding of the through-the-cycle methodology.
By varying the time horizon in the estimation of default-prediction
models, we search for a best match with the agency-rating prediction
model. Consistent with the agencies' stated objectives, we conclude
that agency ratings are focused on the long term. In contrast to
one-year default prediction models, agency ratings place less weight
on short-term indicators of credit quality.
We also demonstrate that the focus of agencies on long investment
horizons explains only part of the relative stability of agency
ratings. The other aspect of through-the-cycle rating methodology -
agency rating-migration policy - is an even more important factor
underlying the stability of agency ratings. We find that rating
migrations are triggered when the difference between the actual agency
rating and the model predicted rating exceeds a certain threshold
level. When rating migrations are triggered, agencies adjust their
ratings only partially, consistent with the known serial dependency of
agency rating migrations.
How Ratings Agencies Achieve Rating Stability
by Edward I. Altman of New York University, and
Herbert A. Rijken of Vrije Universiteit Amsterdam
Posted by omor at 01:59 PM | Comments (0)
January 26, 2005
Web mathematica takes derivatives.
Web Mathematica takes derivatives.
Posted by omor at 05:47 PM | Comments (0)
January 23, 2005
Treeage statistical software for non-statistician
TreeAge offers statistical software for non-statisticians.
Features include sensitivity analysis and distribution graphs.
Posted by omor at 11:57 PM | Comments (0)
January 22, 2005
Belief Networks and Decision Networks
Belief networks (also known as Bayesian networks, Bayes networks and
causal probabilistic networks), provide a method to represent
relationships between propositions or variables, even if the
relationships involve uncertainty, unpredictability or imprecision.
They may be learned automatically from data files, created by an
expert, or developed by a combination of the two. They capture
knowledge in a modular form that can be transported from one situation
to another; it is a form people can understand, and which allows a
clear visualization of the relationships involved.
By adding decision variables (things that can be controlled), and
utility variables (things we want to optimize) to the relationships of
a belief network, a decision network (also known as an influence
diagram) is formed. This can be used to find optimal decisions,
control systems, or plans.

Norsys bayesian belief software, based in Vancouver, Canada.
Posted by omor at 01:52 PM | Comments (0)
January 21, 2005
Agena Risk bayesian network
Agena Risk bayesian network analysis software and whitepapers.
Posted by omor at 01:47 PM | Comments (0)
January 14, 2005
Bayesian Methods for Improving Credit Scoring Models
Abstract: We propose a Bayesian methodology that enables banks with
small datasets to improve their default probability estimates by
imposing prior information. As prior information, we use coefficients
from credit scoring models estimated on other datasets. Through
simulations, we explore the default prediction power of three Bayesian
estimators in three different scenarios and find that all three
perform better than standard maximum likelihood estimates. We
therefore recommend that banks consider Bayesian estimation for
internal and regulatory default prediction models.
Keywords: Credit Ratings, Rating Agency, Bayesian Inference, Basel II
JEL Classification: C11, G21, G33
Bayesian Methods for Improving Credit Scoring Models
by Gunter Löffler of the University of Ulm,
Peter N. Posch of the University of Ulm, and
Christiane Schoene of the University of Ulm
Posted 2004 December 16.
Posted by omor at 03:25 PM | Comments (0)
January 13, 2005
Receiver Operating Characteristic (ROC)
ROC.
The ability of a test to discriminate diseased cases from normal cases
is evaluated using Receiver Operating Characteristic (ROC) curve
analysis (Metz, 1978; Zweig & Campbell, 1993). ROC curves can also be
used to compare the diagnostic performance of two or more laboratory or
diagnostic tests (Griner et al., 1981).
Posted by omor at 12:44 PM | Comments (0)
January 12, 2005
Lindeberg's central limit theorem
Lindeberg's Central Limit Theorem at Planetmath.
Posted by omor at 07:13 PM | Comments (0)
January 07, 2005
TreeBoost - Stochastic Gradient Boosting
TreeBoost - Stochastic Gradient Boosting.
"Boosting" is a technique for improving the accuracy of a predictive
function by applying the function repeatedly in a series and combining
the output of each function with weighting so that the total error of
the prediction is minimized. In many cases, the predictive accuracy of
such a series greatly exceeds the accuracy of the base function used
alone.
Posted by omor at 09:59 PM | Comments (0)
January 02, 2005
Correlation Monger
Correlation monger provides pair-wise correlation of
demographic variables across 50 US states. For example,
Canadians increase property values.
Posted by omor at 12:18 AM | Comments (0)
December 17, 2004
MedCalc basic statisitical features.
MedCalc has good list of basic statisitical features.
# Stepwise Multiple regression
# Stepwise Logistic regression
# Paired and unpaired t-tests
# Rank sum tests: Wilcoxon test (paired data), Mann-Whitney U test (unpaired data)
# Variance ratio test (F-test)
# One-way analysis of variance (ANOVA) with Student-Newman-Keuls (SNK) test for pairwise comparison of subgroups
# Two-way analysis of variance
# Kruskal-Wallis test
# Frequencies table, crosstabulation analysis, Chi-square test, Chi-square test for trend
# Tests on 2x2 tables: Fisher's exact test, McNemar test
# Frequencies bar charts
# Kaplan-Meier survival curve, logrank test for comparison of survival curves, hazard ratio, logrank test for trend
# Cox proportional-hazards regression
# Meta-analysis: odds ratio (random effects or fixed effects model - Mantel-Heinszel method); summary effects for continuous outcomes; Forest plot
# Reference interval (normal range)
# Analysis of Serial measurements with group comparison
# Bland & Altman plot for method comparison (bias plot) - repeatability
Posted by omor at 08:49 PM | Comments (0)
December 09, 2004
Combining trees with CART
Salford CART allows one to choose from several ways of combining
separate CART trees into a single predictive engine. The
trees are combined by either averaging their outputs for
regression or by using an unweighted plurality voting scheme
for classification. The current version of CART offers two
combination methods: Bootstrap aggregation and ARCing. Each
generates a set of trees by resampling (with replacement)
from the original training data.
Posted by omor at 11:05 PM | Comments (0)
December 07, 2004
S-PLUS Predictive Modeling and Computational Finance
S-PLUS Predictive Modeling and Computational Finance
event with abstracts.
Nov 2004 Finance Event Proceedings for LossCalc II: Dynamic Prediction of LGD.
Greg Gupton, Moody's KMV
We describe LossCalc(tm) version 2.0, the Moody's KMV model to predict
loss given default (LGD). LGD is of natural interest to lenders and
investors wishing to estimate future credit losses. LossCalc is a
robust and validated model of LGD for loans and bonds globally.
LossCalc is a statistical model that incorporates information at all levels:
collateral, instrument, firm, industry, country, and the macroeconomy
to predict LGD. Also, and what may be more interesting than merely
having a powerful predictive model, is to see and understand the
underlying drivers of default recovery/loss that we show.
Predictive Modeling for Property & Casualty Pricing Decisions
Jeremy Stanley, Ernst & Young
This presentation focuses on the application of predictive modeling
methodologies to pricing decisions for property & casualty insurance
lines. Predicting the probability of an insured having one or more
claims in a policy period is a key ingredient to determining the price
a carrier will charge. This presentation will compare and contrast
three types of models applied to this problem: generalized linear
models (GLMs), generalized additive models (GAMs) and neural networks.
GAMs allow for non-linearity in the additive terms and limited types
of specified interactions, requiring an intensive modeling effort to
determine the appropriate model structure. GAMs benefit from fast
model fitting performance, robust measures of in-sample error (such as
the Akaike Information Criterion) and can be easily translated into a
multiplicative rating plan. Neural networks, through the control of
the number of optimization iterations, the size of the hidden layer,
and the use of a weight decay parameter, allow for the near-automatic
selection of model architecture, simultaneously encompassing
interaction terms and complex non-linearities. The predictions of
neural networks are difficult to visualize in high dimensions or with
more than two continuous factors, and are not easily translated into a
multiplicative rating plan. Model performance will be compared in
S-PLUS via cross-validation and bootstrap methods, and visualized with
the use of ROC curves and lift charts. Model structure will be
visualized with S-PLUS Trellis Plots, leading to insights that can
improve the selected model structure.
Posted by omor at 01:59 PM | Comments (0)
December 02, 2004
Edward Malthouse, data mining
Edward Malthouse's data mining course (DM).
Posted by omor at 02:17 PM | Comments (0)
November 25, 2004
r project for statistical computing
The r project for statistical computing is an open source companion
to S, S-Plus, successor to XLispStat, and
more.
Whereas SAS and SPSS will give copious output from a regression
or discriminant analysis, R will give minimal output and store the
results in a fit object for subsequent interrogation by further R
functions.
manuals [HTML]
Sample R session
R is an integrated suite of software facilities for data
manipulation, calculation and graphical display. Among
other things it has
* an effective data handling and storage facility,
* a suite of operators for calculations on arrays,
in particular matrices,
* a large, coherent, integrated collection of intermediate
tools for data analysis.
* graphical facilities for data analysis and display
either directly at the computer or on hardcopy.
* a well developed, simple and effective programming
language (called `S') which includes conditionals,
loops, user defined recursive functions and input
and output facilities. (Indeed most of the system
supplied functions are themselves written in the
S language.)
Posted by omor at 02:17 PM | Comments (0)




