October 30,2023

Importing data for age and race only for those which has values ​​for both variables. The Asian population median age according to age data is 35 years old. With a standard deviation of 11.5921, the ages range from an average of about 35.96 years. This dispersion shows that the age statistics vary by 134.377. For Asian population the skewness, which measures the asymmetry of the age distribution is computed as 0.327765. The kurtosis gives information on the age distribution’s tails and sharpness which is 2.35263. The median age of Black population according to age data is 31 years old. With 32.9281 years on average, the average age is slightly more. With a variance of 129.701, the standard deviation shows the distribution of ages is 11.3886. With a skewness of 0.962894, the age distribution of the Black population is more skewed which indicates asymmetry. Compared to the Asian population, the kurtosis of 3.81164 indicates a distinct peak sharpness and tail.

October 20,2023

The age of individuals varies from 13 to 88 years. The average age is 31.7 years, slightly below the mean age of 32.7 years. The standard deviation representing the distance around the mean is about 11.4 years. Since the skewness value is about 0.99, the data appears to be slightly skewed to the right. A kurtosis value of 3.91 is usually used to indicate a relatively high distribution, which deviates from the kurtosis of 3 for a normal distribution. This could indicate a discrepancy or a certain age.

October 18,2023

Looking at the statistics of the age distribution after removing the ages from the cells that do include age values. Because the mean and median are different, the distribution of ages is significantly skewed to the right; in fact the skewness is about 0.73. The kurtosis is near to 3, indicating that there are no significant fat tails or peaks around the mean. After checking the percentage of the right tail of the age distribution lies more than 2 standard deviations from the mean which is somewhat greater than what get for standard normal distribution.

October 11,2023

Several separate tests comparing groups based on factors such as age and race can be repetitive and time-consuming. Instead, the ANOVA test is a much better method. This test allows us to compare all groups at once, making analysis easier and faster, and helps us to understand whether and how these factors are related.

October 06,2023

 

The data presented in the residual models of diabetes and inactivity provide valuable insight into the residual characteristics in the context of a background sample. By using these data we can collectively provide an assessment of the  linear regression model. The residuals are shown to have a moderate spread, exhibit symmetry at the center, and have few outliers. While these characteristics are important for understanding model performance, they do not directly reflect the strength or ambiguity of the relationship between diabetes and inactivity, which would require examination of regression coefficients and R-squared values, if there is any.

October 04,2023

The bootstrap is a versatile and effective statistical tool that may be used to calculate the level of uncertainty surrounding a certain estimate or statistical learning technique. It can offer a confidence range for a coefficient or an estimate of the standard error of that coefficient. The bootstrap method allows the computer to recreate the process of getting fresh data sets, allowing us to estimate our estimate’s variability without having to create more samples.

 

 

October 02,2023

Bootstrapping creates many different versions of the dataset, which allows us to see how different subsets of data can affect the result. Bootstrapping also helps to get a more accurate understanding of the overall given diabetes population and it can also help to calculate more accurate confidence intervals. Bootstrapping will helps us to assess the model performs on new data that its not seen