Stats Project Installment 1 - Answer with 100/100 Grade
By: ericawing • February 16, 2019 • Coursework • 1,934 Words (8 Pages) • 1,934 Views
Credit Risk Project, Installment 1
These questions use your team project dataset. For the sake of checking conditions, assume that the cases represent a simple random sample from the population of loans described in the project description. You do not have all of the loans, just a small sample out of the many thousands handled by this lender. For these questions, use only the 628 loans with complete descriptions in your data table.[1]
A key variable in the remainder of the analysis is defined as follows. The lender uses a performance metric known as PRSM, performance ratio at six months. To construct PRSM, divide two times the amount repaid at six months by the total amount to be repaid:
[pic 1]
PRSM should be approximately equal to 1 if the payments at 6 months are on track to fulfill the debt at the end of the year. Values of PRSM < 1 indicate a loan for which the payments today are coming in slower than expected; PRSM > 1 indicates a loan that is being paid off faster than expected. You will need to create this column in your JMP dataset using the formula calculator. The formula calculator manipulates columns in the data table.[2]
- (a) Using both graphics and descriptive statistics describe the shape and form of the distribution of the PRSM score. Does it appear reasonable to use the Empirical Rule with this variable?
[pic 2]
[pic 3]
The distribution of PRSM score is not bell-shaped and is not a normal distribution. 97.5% of the PRSM score lie between 0-1; there is one extreme outlier at 129.16334.
As further shown in the Normal Quantile Plot, the majority of the dots are jumping outside of the dashed lines above and below the diagonal. There is no approximate agreement between the PRSM data and the normal distribution.
Therefore, it is not reasonable to use the Empirical Rule with this variable.
(b) If you were to remedy any anomalies with this variable, for example by excluding specific cases, then how well would the Empirical Rule work now?
Excluding the outlier 129.16334, we will get the new distribution of the PRSM score as below. It is close to a bell-shaped distribution.
[pic 4]
Again, we use Normal Quantile Plot to judge if this histogram of PRSM score can be treated as Normal Distribution.
[pic 5]
As shown in the Normal Quantile Plot, all the plots lie between the two dashed lines on either side of the diagonal. Thus, the distribution can be considered approximately normal, and the Empirical Rule can be applied well to this variable.
- (a) The variable Years in Business may ultimately be useful in forecasting the PRSM score. Comment on the distribution of this predictor.
[pic 6]
As shown above, the distribution of Years in Business is right-skewed, which suggest that it might be of interest to look at the data set on a transformed scale (using log transformation)
(b) It is common (but not a requirement) to take the log of a variable that displays a shape such as this variable does. Further, the log transform is not defined for the value of zero, so when a variable contains zero values, we modify the log transform to include an offset term. The most common offset is 1. Comment on the distribution of log (1 + Years in Business).
[pic 7]
As shown above, the log transformation brings the data closer together, decreasing the degree of skewness. While the data set has a lower varience, the distribution is still right-skewed.
[pic 8]
Using the Normal Quantile Plot, we can see that there are dots jumping outside the dashed lines, which means that the Empirical Rule does not apply to this data set.
- Suppose 25 new loans are originated to different merchants, each with a Total Amount to be Repaid of $10,000. You can assume that these PRSM scores represent a sample from a normally distributed population. As a simplifying assumption, you can assume that the population's mean and standard deviation is the same as the mean and standard deviation of the sample in your dataset. Make sure you are using the data set where any initial anomalies in the PRSM score have been remedied.
- Estimate the probability that, in 6 months, the PRSM score of the first of these 25 loans will be less than 0.75. (You can use the JMP file “Norm Prob.jmp” to calculate these probabilities).
P(x<0.75) ≈ 0.36
Since we assume that these 25 PRSM scores represent a sample from a normally distributed population, the sample average equals the population mean 0.795 and sample standard deviation equals population standard deviation 0.124.
Plug mean 0.795, standard deviation 0.124 into “Norm Prob.jmp” and set a=0.75 to get P(x P(x<0.75) ≈ 0.36
- What is the approximate probability that, in 6 months, the average PRSM score of these 25 loans will be less than 0.75? State any assumptions you have used for your probability calculation.
P(x<0.75) ≈ 0.035
The distribution of the average PRSM score of the sample (n=25) is a normal distribution.
Based on the CLT, x-bar = μ (the entire PRSM score population mean) = 0.795
Standard Deviation s =σ/√n= 0.124/√25 = 0.124/5 = 0.0248
Plug (x-bar = 0.795, s = 0.0248) into “Norm Prob.jmp” and set a=0.75 to
get P(x<0.75) ≈ 0.035
Assumptions
1.The (n=25) PRSM sample is an IID sample.
2.The sample size n=25 is sufficiently large so that the distribution of x-bar is approximately normal, and so is essentially determined by μ and σ
3. The distribution of x-bar:
- has mean = μ (Mean of the entire PRSM population) =0.795
- has standard deviation s = σ/√n=0.0248
- What is the probability that less than $93,750, in total across the 25 merchants, will have been paid back after 6 months?
The probability is 0.035.
PRSM=2 * (Amount repaid at six months / Total amount to be repaid),
Total amount to be repaid across the 25 merchants=25 * $10000 =$ 250000
Therefore, PRSM=2* (Amount repaid at six months / $250000)
When Amount repaid at six months < $93750, PRSM=2* (Amount repaid at six months / $250000) < 2*$93750 / $250000=0.75
Therefore, the propability of Amount repaid at six months < $93750 is the same as the propability of PRSM score < 0.75.
Plug a=0.75, (Mean=0.795, Standard Deviation=0.025) into “Norm Prob.jmp” to get P(x<0.75) ≈ 0.035
...