Explorative data analysis of loan data
In this project for the Udacity Nanodegree Data Analyst, I explored loan data from Prosper, an US-based lending platform. The data set contains 113,937 loans and 84 variables. The objectives of the analysis was to summarize the data to determine (1) the relationship between the various variables of interest and (2) how the interest rates for individuals loans can be predicted with the available data. Using R, examined the data with a wide range of exploratory plots and linear regression analysis to determine the aspects that influence interest rates of consumer loans in the US. The complete report, the data and the R-code can be found in this github repository.
As expected, I found a strong negative relationship between the consumer’s score and income as well as the length, amount and monthly payment of the loan.
There is an interesting relationship between interest rate, monthly payment and the borrower’s score. In particular for more recent loans that are not yet completed, the monthly payment relative to the loan is a good predictor for the interest rate. Similarly, there is a strong negative correlation between the borrower’s credit score and the interest rates for his or her loan.
Not surprisingly, loans with a high initial interest rate are more likely to be defaulted on or to be past due.