
Statistics Electronic TextBook in english:
Stat Soft
This Electronic Statistics Textbook offers training in the understanding and application of statistics. The material was developed at the StatSoft R&D department based on many years of teaching undergraduate and graduate statistics courses and covers a wide variety of applications, including laboratory research (biomedical, agricultural, etc.), business statistics and forecasting, social science statistics and survey research, data mining, engineering and quality control applications, and many others.
The Electronic Textbook begins with an overview of the relevant elementary (pivotal) concepts and continues with a more in depth exploration of specific areas of statistics, organized by "modules," accessible by buttons, representing classes of analytic techniques. A glossary of statistical terms and a list of references for further study are included.
Fulltext: ZIPed HTML EBook

Coupled TwoWay Clustering Analysis of Breast Cancer and Colon Cancer Gene Expression Data
Authors: Gad Getz, Hilah Gal, Itai Kela, Eytan Domany (Weizmann Inst. of Science), Dan A. Notterman (Robert Wood Johnson Medical School and Princeton University)
Comments: 9 pages, 4 figures
Subjclass: Biological Physics
We present and review Coupled Two Way Clustering, a method designed to mine gene expression data. The method identifies submatrices of the total expression matrix, whose clustering analysis reveals partitions of samples (and genes) into biologically relevant classes. We demonstrate, on data from colon and breast cancer, that we are able to identify partitions that elude standard clustering analysis.
Fulltext: PostScript, PDF, or Other formats

XploRe: The Interactive Statistical Computing Environment
Author: Wolfgang Hardle
Overview of statistical resources in the Web.
Fulltext: PDF

A Cluster Analysis Approach to Financial Structure in Small Firms in the United States
Author: Brian Gibson
Department of Accounting and Finance, University of Newcastle, Australia
This paper uses data from the 1993 National Survey of Small Business Finances sponsored by the Board of Governors of the Federal Reserve Board and the U.S. Small Business Administration to explore the equity and debt structure of small firms in the United States. Initially the paper provides descriptive detail of financial structure that is then used in a cluster analysis process to help identify a range of possibly "typical" financial structures. Results are also presented of exploratory analysis that seeks to identify associations with a range of variables posited to influence such structures. Included are industry, age, profit (measured in absolute and relative terms), sales growth, asset structure, and size (measured by sales and employee numbers). Results are generally supportive of the general agency cost explanations of financial structure in small firms.
Fulltext: PDF

The determinants of stock returns: An analysis of industrial sector indices
Authors: Séverine CAUCHIE, Martin HOESLI, Dusan ISAKOV
HECUHECUniversity of Geneva, International Center FAME and University of Aberdeen (Department of Accountancy and Finance)niversity of Geneva
HECUniversity of Geneva and International Center FAME
Keywords: Arbitrage Pricing Theory, stock returns, principal component analysis JEL Classification : G12,G15
This paper investigates the determinants of stock returns in a small open economy in an Arbitrage Pricing Theory framework. The analysis is conducted with monthly data from the Swiss stock market over the period 19862000. We use data on industrial sector indices, as well as macroeconomic data. Both a statistical and a macroeconomic implementation of the model are provided. We find that Swiss equity returns are influenced by both global and domestic economic conditions. The results also show that the statistically determined factors yield a better representation of the determinants of stock returns than the macroeconomic variables.
Fulltext: PDF

A STATISTICAL ANALYSIS OF BANKS IN ARMENIA
Authors: Cyrus Safdari, Ph.D., Nancy J. Scannell, Ph.D., Rubina Ohanian, Ph.D.
Prepared for the Armenian International Policy Research Group conference Armenia: Recent Economic Trends and Growth Prospects
Financial data for banks operating in Armenia in 2001 were extracted from Arka News Agency publications which use the variable, Weight Share of Assets, to rank and classify the country's 31 banks. The present study employs statistical procedures, including Factor Analysis and Cluster Analysis, applied to a selected subset of banks, namely those for which complete data was available. One conclusion drawn from the study corroborates Arka's use of the Weight Share of Assets to classify banks. Further analysis determined various cutoff points for Weight Share values used to delineate bank peer groups. Peer grouping is an effective tool to perform comparative analyses among banks or other entities.
Fulltext: PDF

Cluster Analysis of Gene Expression Data
Authors: Eytan Domany (Weizmann Inst. of Science)
Comments: 18 pages, 6 figures
Subjclass: Biological Physics
The expression levels of many thousands of genes can be measured simultaneously by DNA microarrays (chips). This novel experimental tool has revolutionized research in molecular biology and generated considerable excitement. A typical experiment uses a few tens of such chips, each dedicated to a single sample  such as tissue extracted from a particular tumor. The results of such an experiment contain several hundred thousand numbers, that come in the form of a table, of several thousand rows (one for each gene) and 50  100 columns (one for each sample). We developed a clustering methodology to mine such data. In this review I provide a very basic introduction to the subject, aimed at a physics audience with no prior knowledge of either gene expression or clustering methods. I explain what genes are, what is gene expression and how it is measured by DNA chips. Next I explain what is meant by "clustering" and how we analyze the massive amounts of data from such experiments, and present results obtained from analysis of data obtained from colon cancer, brain tumors and breast cancer.
Fulltext: PostScript, PDF, or Other formats

Integration of a Regression Analysis with Association Rules for Effective Data Mining
Authors: Kwang B. Lee and Sang C. Suh
Department Computer Science Texas A&M UniversityCommerce Commerce, Texas 754293011
Data mining or knowledge discovery in databases is the search for relationships and global patterns that exist, but are hidden in large databases. Many different methods have been proposed so far, one among them is data mining of association rules in market data. Association rules, whose significance is measured via support and confidence factors, are intended to identify rules of the type, A customer buying item X often also buys item Y. In this paper, we introduce an effective data mining process that combines the concept of association rules with the statistical regression analysis method. We propose measuring of significance of associations via the regression analysis which will eventually lead to a measure that can be used to uncover affinities among the collection of items, enabling us to reduce the mining problems of searching for the support factor and not to use Boolean in the attributes.
Fulltext: PDF

Rweb: Webbased Statistical Analysis
Author: Je Banfield
Department of Mathematical Science Montana State University Bozeman, MT 59717
Rweb is a freely accessible statistical analysis environment that is delivered through the World Wide Web (WWW). It is based on R, a well known statistical analysis package. The only requirement to run the basic Rweb interface is a WWW browser that supports forms. If you want graphical output you must, of course, have a browser that supports graphics. The interface provides access to WWW accessible data sets, so you may run Rweb on your own data. Rweb can provide a four window statistical computing environment (code input, text output, graphical output, and error information) through browsers that support Javascript. There is also a set of point and click modules under development for use in introductory statistics courses.
Fulltext: PDF

SELECTION OF INDEPENDENT FACTOR MODEL IN FINANCE
Authors: LaiWan Chan and SiuMing Cha
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong.
In finance, factor model is a fundamental model to describe the return generation process. Traditionally, the factors are assumed to be uncorrelated with each other. We argue that independence is a better assumption to factor model from the viewpoint of portfolio management. Based on this assumption, we propose the independent factor model. As the factors are independent, construction of the model would be another application of Independent Component Analysis (ICA) in finance. In this paper, we illustrate how we select the factors in the independent factor models. Securities in the Hong Kong market were used in the experiment. Minimum description length (MDL) was used to select the number of factors. We examine four sorting criteria for factor selection. The resultant models were crossexamined by the runs test.
Fulltext: PDF

A Multiple Factor Model for European Stocks A Multiple Factor Model for European Stocks
Authors: Thomas G. Stephan, Raimond Maurer and Martin Dürr
Deutscher Investment Trust (DIT),Frankfurt/Main
Johann Wolfgang Goethe University of Frankfurt/Main, Chair for Investment, Portfolio Management and Pension Systems
Fulltext: PostScript, PDF, or Other formats

Financial Markets, Very Noisy Information Processing
Authors: Malik MagdonIsmail, Alexander Nicholson and Yaser AbuMostafa
Proceedings of the IEEE, Special Issue on Intelligent Signal Processing, Nov. 1998.
Keywords: Learning, Noise, Convergence, Bounds, Test Error, Generalization Error, Model Limitation, Volatility.
We report new results about the impact of noise on information processing, with application to financial markets. These results quantify the tradeoff between the amount of data and the noise level in the data. They also provide estimates for the performance of a learning system in terms of the noise level. We use these results to derive a method for detecting the change in market volatility from period to period. We successfully apply these results to the four major foreign exchange markets. The results hold for linear as well as nonlinear learning models and algorithms, and for different noise models.
Fulltext: PostScript

COUNTRY AND SIZE EFFECTS IN FINANCIAL RATIOS: A EUROPEAN PERSPECTIVE
Author: C. Serrano Cinca, C. Mar Molinero, J.L. Gallizo Larraz
Department of Accounting and Finance University of Zaragoza, Spain.
Department of Management University of Southampton, UK.
Keywords: BACH database, firm size, Central Balance Sheet, threeway scaling, INDSCAL, financial statement analysis, European business evolution.
Harmonised aggregate financial statements are published by the European Commission in the BACH database. This information is organized by country, size of firm, and year. Financial ratios obtained from this database are analysed using multivariate statistical techniques in order to explore country and size effects. The data relates to three size groups, eleven countries, fourteen years, and fifteen financial ratios. It is found that financial ratios reflect the size of the firm, but that the way in which this is reflected varies between the different countries. It is also found that there are no significant size related differences in financial profitability, but that such differences appear when countries are compared. Important regularities are found over time. Some time effects are also found in the way countries react to the business cycle.
Fulltext: PDF

White Noise Tests and APT Economic Factor Syntheses Using Temporal Factor Analysis
Authors: Kai Chun Chiu and Lei Xu
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, P.R. China.
The wellknown Arbitrage Pricing Theory (APT) in finance relates security returns to variations of several economic factors. Previous success using Temporal Factor Analysis (TFA) to extract statistically uncorrelated factors has shredded light on the possibility of further identification of hidden driving economic factors. In this paper, we will first perform white noise test on the residuals of the TFA model for model adequacy. Next, we will show how correlated economic factors can be synthesized from uncorrelated temporal Gaussian factors in TFA.
Fulltext: PDF

Statistical Sampling and Regression Analysis for RTLevel Power Evaluation
Authors: ChengTa Hsieh Qing Wu ChihShun Ding Massoud Pedram
Department of Electrical Engineering  Systems University of Southern California Los Angeles, CA 90089
In this paper, we propose a statistical power evaluation framework at the RTlevel. We rst discuss the power macromodeling formulation, and then propose a simple random sampling technique to alleviate the the overhead of macromodeling during RTL simulation. Next, we describe a regression estimator to reduce the error of the macro modeling approach. Experimental results indicate that the execution time of the simple random sampling combined with power macromodeling is 50X lower than that of conventional macromodeling while the percentage error of regression estimation combined with power macromodeling is 16X lower than that of conventional macromodeling. Hence, we provide the designer with options to either improve the accuracy or the execution time when using power macromodeling in the context of RTL simulation.
Fulltext: PDF

Higher Education Finance Variables: An Analysis
Author: Deagelia M. Peña
THE NEA 2000 ALMANAC OF HIGHER EDUCATION
How should America provide postsecondary education for growing numbers of students? States, notes one observer, face a fundamental choice: Limit the number of college students served or serve more students more effectively. The declining proportion of state budgets earmarked for higher education led public higher education finance researchers to focus on the second alternative. Key strategies include: allocating constrained state resources equitably and efficiently, pricing tuition fairly and reducing inequitable financial assistance, and planning for increased student diversity.
The dual focus on equity and efficiency resulted in several creative proposals. Two scholars, for example, developed a tuitionpricing model, based on student willingness to pay, that helped colleges establish equitable tuition rates. Other scholars proposed changes in the distribution of state funds among colleges.
Another researcher, who proposed a major role for faculty and staff in determining college budgets, posed a key question for scholars and practitioners: How, precisely, can we determine the financial health of an institution? More precise assessments, this researcher added, would permit association leaders and faculty members to compare the financial standing of their college to similar institutions. Scholars usually assess financial health by comparing income sources to expenditures. Income streams include state and local revenue; tuition and fees; endowment income; and federal, state, local, and private grants and contracts. Expenditures include the costs of instruction; research; academic support; student services; institutional support; plant operation and maintenance; scholarships and fellowships; and educational, general, and current fund expenditures and transfers.
Applying inconsistent definitions to key items often leads to inconclusive, even misleading, interinstitutional comparisons. Is there a better way of comparing the financial condition of institutions?
Fulltext: PDF

Z/Yen and Technology Commercialisation
© Z/Yen Limited, 2001 Risk/Reward Managers 57 St Helen s Place 5/5 tel: +44 20 75629562 London EC3A 6AU fax: +44 20 76285751 United Kingdom
Z/Yen Ltd, the leading risk/reward management group, provides consultancy in science and technology for organisations and potential investors in technology based ventures.
Fulltext: PDF

Combining DEA and factor analysis to improve evaluation of academic departments given uncertainty about the output constructs.
Authors: SClaudina Vargas and Dennis Bricker
Department of Industrial Engineering, University of Iowa, Iowa City, IA 52242, USA April 2000
Keywords: Data envelopment analysis (DEA); Factor analysis (FA); University programs; Policy decisions.
This paper combines the CCR outputoriented model of data envelopment analysis (DEA) and Factor Analysis (FA) to evaluate the performance of academic units of a university s graduate programs relative to their counterparts nationally. We propose DEA/FA as a means of increasing the utility of DEA for policy decisions when there is uncertainty about the output constructs relevant to the programs. We discuss the concept that an academic program often maximizes the levels of some constructed outputs (CO), which may not themselves be directly observable. By means of FA, these COs can be deduced from the observable outputs, and can be expressed as a linear combination of observed and random components. Using the COs lessen the caveat of extreme specialization without the requirement for value judgements.
Fulltext: PDF

The Pitfalls of Convergence Analysis: Is the Income Gap Really Widening?
Authors: MATTHEW A. COLE and ERIC NEUMAYER
Department of Economics, University of Birmingham, Edgbaston, Birmingham,
A number of studies have tested whether, globally, per capita incomes are converging over time. To date, the majority of studies find no evidence of absolute convergence, but many find evidence of conditional convergence, i.e. convergence having controlled for differences in technological and behavioural parameters. The lack of evidence of absolute convergence has led to claims that global income inequality is deteriorating. We believe this to be untrue. Most convergence studies are aimed at proving or disproving the neoclassical growth model and hence take the country as the unit of measurement. However, if one is making inferences about world income distribution the focus should be on people rather than countries to prevent China and Luxembourg, for example, receiving equal weighting in the analysis. We use the ßconvergence method and two different measures of per capita income and show that there is indeed evidence of income divergence between countries. However, crucially, we also find convincing evidence of income convergence if we weight our regressions by population. Thus, we find that poor peoples incomes are growing faster than rich peoples incomes, suggesting that global income inequality is in fact improving.
Fulltext: PDF

DETERMINING THE NUMBER OF FACTORS IN APPROXIMATE FACTOR MODELS
Authors: Jushan Bai and Serena Ng
Department of Economics Boston College Chestnut Hill MA 02467
Keywords: Factor analysis, asset pricing, principal components, model selection.
In this paper we develop some econometric theory for factor models of large dimensions. The focus is the determination of the number of factors (r), which is an unresolved issue in the rapidly growing literature on multifactor models. We first establish the convergence rate for the factor estimates that will allow for consistent estimation of r. We then propose some panel Cp criteria and show that the number of factors can be consistently estimated using the criteria. The theory is developed under the framework of large crosssections (N) and large time dimensions (T). No restriction is imposed on the relation between N and T. Simulations show that the proposed criteria have good finite sample properties in many configurations of the panel data encountered in practice.
Fulltext: Link for download PDF

Neural Networks for Density Estimation
Authors: Malik MagdonIsmail and Amir Atiya
Advances in Neural Information Processing Systems (NIPS) 1998, to appear.
California Institute of Technology
We introduce two new techniques for density estimation. Our approach poses the problem as a supervised learning task which can be performed using Neural Networks. We introduce a stochastic method for learning the cumulative distribution and an analogous deterministic technique. We demonstrate convergence of our methods both theoretically and experimentally, and provide comparisons with the Parzen estimate. Our theoretical results demonstrate better convergence properties than the Parzen estimate.
Fulltext: PostScript