University of Wisconsin - Madison (Sep 2012 - May 2016) BS in Mathematics (GPA: 4.0/4.0) & BS in Statistics (GPA: 3.95/4.0)
Minor in Computer Science (GPA: 4.0/4.0) General GPA: 3.906/4.0
Dean’s List (8000/50000), University of Wisconsin-Madison (2012&2013&2014&2015)
Computer skills: Java, C, C++, Excel, R, MatLab Interests: Piano, Electronic organ, handicraft
The effect of different initializations on network clustering performance using balanced label propagation
Social networks are one of the largest data sets in modern times. Due to the constraints of hardware and the demands of fast data access, it is beneficial to store the network information into several smaller density connected groups. Moreover, in order to make full use of computer resources, it is necessary that the sizes of the clusters should be similar. One of the algorithms that may be able to achieve both goals is balanced label propagation. However, this algorithm must be initialized with balanced clusters. In that case, the quality of the initialization will largely affect the quality of the final clustering. We are interested in the effect of different initialization in different social networks with varying cluster sizes.
In this study, we have chosen to test three initialization: (1)random, (2)spectral clustering, and (3)ego-networks approach. In experiments performed on three different social networks, ignoring the restraint of balance, spectral clustering will always outperform the ego-network and random clustering. However, if we increase our focus on balance, the performance of spectral clustering becomes highly unstable, largely because it sometimes produce highly unbalanced clustering. On the other hand, the ego-network always performs better than random clustering, in any situation.
Social networks, Spectral clustering, Ego-network, Balanced label propagation
The full content of the thesis and the poster are avilable on request
The Social Effects on US Foreclosure Rate
The house is one of the most important property in people’s life. The statue of a house is highly related to the owner’s economic, social and even cultural capitals. Because of the higher prices of houses compared to the general commodities, nowadays, people prefer to apply for loans in order to get their dream houses. It is a attractive choice for most of the people since they can enjoy the house before they actually completely own it. However, the loans are connected with risks. Once the borrower fails to pay on time, the house will be considered as a delinquent mortgage, and if the owner fails to pay in a certain time period, the lender can start a legal process called foreclosure and force the borrower to sell the house.
It is hard breaking for a family to lose their house. However, the foreclosure rate has a far more profound effect on society. In fact, a neighborhood with higher foreclosure rate may suggests the instability of the neighborhood, and it may lead to higher divorce rate and even higher criminal rate. Therefore, we are interested in the factors may lead to or caused by higher foreclosure rate.
Foreclosure rate, Social factors, Unemployment rate, Race, Principle Component Analysis
NFL Team Performance Regression Analysis
For the past 30 years, American football has been deemed the most popular sport in the United States1. With the popularity exploding particularly in recent years, the multimillion dollar industry has become very important to more than just the fans and athletes themselves. The success of organizations has become truly larger than just a game affecting revenue streams for entire cities. With all that’s riding on the output of these teams the question that begs to be addressed is; what makes a football team successful?
To answer this question, the following report takes into account a multitude of on-field, play-to-play statistics that could be associated with a professional football team’s success. The use of play-to-play statistics are felt to be good predictors of performance versus “drive” statistics as they occur more frequently during a game / season, and are more focused performance aspects an NFL team could aim to improve upon, instead of a more general “score more touchdowns” approach.
To measure the regular season success, win percentage was first considered. However, in a regular season, each NFL team plays 16 games, such that win percentage can only be divided into 16 categories. Instead, point differential was chosen as the response variable rather than season win % as this is much closer to a continuous variable, and more likely to satisfy linear regression assumptions. Point differential was a good candidate to represent the success of teams, as it was found to have an extremely high correlation of 0.93 with win percentage.
NFL, Linear regression, Point differential, Model selection
The full content of the projects are available on request
Stat 456 Applied Multivariate Analysis
Stat 479 Introduction to Classification and Regression Tree
Stat 575 Statistical Methods for Spatial Data (graduated level class)
Stat 992 Theory and Methods for Social Network Analysis (graduated level class)
Stat 224 Introductory Statistics for Engineers
Stat 310 Introduction to Probability and Mathematical Statistics II
Stat 327 Learning a Statistical Language R
Stat 333 Applied Regression Analysis
Stat 349 Introduction to Time Series
Stat 411 An Introduction to Sample Survey Theory and Methods
Stat424 Statistical Experimental Design
Math 234 Calculus - Functions of Several Variables
Math 341 Linear Algebra
Math 421 The Theory of Single Variable Calculus
Math 431 Introduction of the Theory of Probability
Math 475 Introduction to Combinatorics
Math 521 Analysis I
Math 531 Probability Theory
CS302 Introduction to Programming
CS354 Machine Organization and Programming
CS367 Introduction to Data Structures
CS435 Introduction to Cryptography
Sep 2015 - May 2016
Work in university residence halls and provide drop-in tutor help for all levels of calculus classes. Take part in the organizations of residence activities.
Sep 2014 - Dec 2014
Assisted in teaching on calculus section and helping students on group discussion and individual Q&A; Helped students achieving better grades on their standard calculus classes.
Jan 2014 - Dec 2014
Helped high school students with their math homework and ACT practice; Motivated students by providing personal discovery, academic improvement and career exploration.
Nov 2014, Nov 2013
Held annually meeting for middle school girls to visit campus and take part in science activities.
Apr 2013, Apr 2014
Offered online service to solve any technology problems during the competition.
Address: 530 West Johnson Street, Apt 707, Madison, Wisconsin, 53703
Email: xyang222@wisc.edu
Phone: 608-301-7012