November 29, 2023

Augmented Dickey-Fuller (ADF) tests the null hypothesis that a unit root is present in a time series. Each plot is labeled with an “ADF Statistic” and a “p-value,” which are used to determine whether a time series is stationary.

Here is the analysis:

ADF Test: Total Jobs
The plot shows the time series data for “total_jobs.”
The ADF statistic is positive, and the p-value is very high (0.9475), indicating strong evidence that the series is non-stationary.

ADF Test: Unemp Rate
This is the time series data for “unemp_rate.”
The ADF statistic is negative, but the p-value is not below the common threshold of 0.05 (0.4789), suggesting the series is likely non-stationary.

ADF Test: Logan Passengers
This plot represents “logan_passengers” over time.
The ADF statistic is positive, and the p-value is extremely high (0.9853), indicating that the series is non- stationary.

ADF Test: Logan Intl Flights
The time series data for “logan_intl_flights” is shown.
The ADF statistic is negative, and the p-value is 0.2306, which is above the 0.05 threshold, suggesting non- stationarity.

ADF Test: Hotel Occup Rate
The plot displays the “hotel_occup_rate” time series.
The ADF statistic is negative, with a p-value of 0.4359, again indicating non-stationarity as the p-value is above 0.05.

ADF Test: Hotel Avg Daily Rate
This plot shows the “hotel_avg_daily_rate” time series.
The ADF statistic is negative, and the p-value is very low (0.0058), suggesting that the series is stationary.

ADF Test: Labor Force Participation Rate
This plot shows the “Labor_Force_Part_Rate” time series.
The ADF statistic value is positive, with a p-value (0.9691). With a p-value significantly greater than the common threshold of 0.05, the test suggests that the series is non-stationary.

For the ADF test, a p-value below a threshold (commonly 0.05) indicates stationarity, meaning there is no unit root present in the time series. A non-stationary time series is characterized by a changing mean or variance over time, which can be problematic for many types of time series analysis, including forecasting.

November 27, 2023

The joint kernel density estimate (KDE) plots illustrate the relationship between different economic indicators and the total number of jobs, using data from the provided dataset. Here’s an analysis of each plot:

Total Jobs vs Hotel Average Daily Rate:
The plot suggests a concentration of points where the average daily hotel rate is around $250, with the highest job numbers.
This may indicate that when hotel rates are at a moderate level, it is correlated with higher employment, possibly due to balanced tourism or business travel activities.

Total Jobs vs Hotel Occupancy Rate:
The highest density is observed at occupancy rates between 0.7 and 0.9, which could suggest a positive association with total jobs.
This pattern implies that higher hotel occupancy rates, potentially indicating higher tourist or business activity, might correspond with higher employment levels.

Total Jobs vs Unemployment Rate:
The density is elongated and negatively sloped, indicating an inverse relationship between the unemployment rate and total jobs, which is expected.
As the unemployment rate decreases, the total number of jobs tends to increase.

Total Jobs vs Labor Force Participation Rate:
The plot shows a slight positive trend, with higher job numbers corresponding to a labor force participation rate mainly between 0.63 and 0.67.
This could imply that as more people participate in the labor force, it is indicative of a stronger job market.

Total Jobs vs Logan International Flights:
The density suggests a positive relationship, with a greater number of jobs associated with an increased number of international flights.
This may reflect the impact of international travel on local employment, particularly in sectors linked to travel, tourism, and possibly international business.

Total Jobs vs Logan Passengers:
Like international flights, there is a positive correlation with the number of passengers.
The highest density of job numbers coincides with passenger numbers around 3 million, indicating that air travel volume may positively influence employment figures.

November 24, 2023

Total Jobs vs Logan Passengers:
There is a positive relationship between the number of passengers at Logan Airport and the total number of jobs. The R-Squared value is approximately 0.729, suggesting that about 72.9% of the variability in total jobs can be explained by the number of Logan passengers. The p-value is extremely low (approximately 3.57×10-15), indicating a statistically significant relationship.

Total Jobs vs Logan International Flights:
Similarly, the number of international flights has a positive correlation with the total number of jobs. The R-Squared value is 0.764, meaning that approximately 76.4% of the variability in total jobs is accounted for by the number of international flights. The p-value is very small (around 3.04×10-17), which implies a statistically significant relationship.

Total Jobs vs Hotel Occupancy Rate:
The relationship between hotel occupancy rates and total jobs is weaker compared to the previous two variables. The R-Squared value is about 0.142, indicating that only 14.2% of the variability in total jobs is explained by the hotel occupancy rate. The p-value is approximately 0.197, which is above the typical significance level of 0.05, suggesting that the relationship might not be statistically significant.

Total Jobs vs Hotel Average Daily Rate:
There is a moderate positive relationship between the average daily rate of hotels and total jobs. The R-Squared value is 0.313, which means that about 31.3% of the variability in total jobs can be explained by the hotel average daily rate. The p-value is approximately 0.0038, indicating a statistically significant relationship at common significance levels.

Total Jobs vs Unemployment Rate:
There is a strong negative relationship between the unemployment rate and the total number of jobs, which is intuitive as higher unemployment would typically be associated with fewer jobs. The R-Squared value is about 0.872, suggesting that 87.2% of the variability in total jobs can be explained by the unemployment rate. The p-value is extremely low (around 4.10 x 10-27), indicating a very strong statistically significant relationship.

November 22, 2023

R-squared (0.9564): This indicates a very high proportion of variance in the dependent variable (total jobs) is predictable from the independent variables in the model.
Adjusted R-squared (0.9213): This is a modified version of R-squared adjusted for the number of predictors in the model, still indicating a good fit.
MAE (Mean Absolute Error): The average absolute error of the predictions is 3889 jobs.
MSE (Mean Squared Error): The average squared difference between the estimated values and the actual value is 2,229,292.5, a measure that gives higher weight to larger errors.
RMSE (Root Mean Squared Error): The square root of MSE, which is 4708 jobs, gives an idea of the magnitude of the errors in the same units as the dependent variable (total jobs).

R-squared (0.9564): This value is very high, suggesting the model explains a large proportion of the variance in the validation dataset.
Adjusted R-squared (0.9213): This is also high, indicating that the number of predictors in the model is appropriate for data and the model fits the validation data well.
MAE (Mean Absolute Error) (3888.99): On average, the model’s predictions are off by approximately 3889 jobs from the actual values.
MSE (Mean Squared Error) (2,229,292.5): This is relatively high, influenced by the squared nature of the metric which gives more weight to larger errors.
RMSE (Root Mean Squared Error) (4708.31): This is the square root of the MSE and provides an error term in the same units as the predicted variable (total jobs). This value suggests that typical predictions are within approximately 4708 jobs of the actual values.

November 20, 2023

Hotel Average Daily Rate: Shows a distribution with a clear peak, which may indicate a common average daily rate around which hotel prices are centered. The spread of the plot might suggest variations in pricing, which could reflect different hotel categories or seasonal pricing strategies.

Hotel Occupancy Rate: Has a unimodal distribution, perhaps indicating that most hotels maintain a consistent occupancy rate, with fewer occurrences of very low or very high occupancy. This could reflect a stable demand for accommodation in Boston.

Labor Force Participation Rate: Shows a tight distribution, indicating that the participation rate does not fluctuate widely and remains relatively stable over time.

Logan International Flights: Displays the distribution of the number of international flights at Logan Airport. A unimodal, possibly slightly skewed distribution would suggest that there’s a common range of flight numbers, with occasional periods of increased or decreased international traffic.

Logan Passengers: Shows a distribution potentially skewed to one side, indicating variability in passenger numbers. Peaks could correspond to high-travel seasons or specific events that attract more travelers.

Total Jobs: Appears to illustrate the distribution of total job numbers in Boston. The distribution might be relatively broad, indicating variability in employment numbers, which could be influenced by economic cycles, job market health, and seasonal employment trends.

Unemployment Rate: Seems to have a pronounced peak, suggesting the most common unemployment rate that the city experiences. A narrower peak could indicate a relatively stable unemployment rate over the period analyzed.

November 17, 2023

Logan Passengers: Shows the frequency distribution of the number of passengers traveling through Logan Airport. The distribution seems to be skewed to the right, indicating that there are days with exceptionally high passenger numbers, possibly during peak travel seasons or special events.

Logan Intl Flights: The histogram for international flights also appears to be right-skewed, suggesting that while there is a consistent average number of flights, there are periods with significantly higher international traffic.

Hotel Occupancy Rate: Shows a potential left-skewed distribution, indicating that there are fewer instances of low occupancy rates and a tendency for higher occupancy on most days.

Hotel Avg Daily Rate: Exhibits a somewhat uniform distribution with several peaks, suggesting that there are common price points at which hotels set their daily rates, with fluctuations around these points.

Total Jobs: Looks to be normally distributed with a slight right skew, implying that most days have a consistent number of jobs, with occasional peaks possibly due to seasonal employment or economic growth.

Unemployment Rate: Shows a right-skewed distribution, indicating that lower unemployment rates are more common, with fewer occurrences of higher rates.

Labor Force Participation Rate: Appears somewhat normally distributed, suggesting that the labor force participation rate in Boston remains relatively stable over time.

November 15, 2023

Understanding the general pattern in order to determine Boston’s tourism peak times:

The monthly passenger data at Logan Airport paints a vivid picture of tourism trends in Boston. Throughout the year, we can see distinct peaks and valleys in the number of passengers, which correspond to the highs and lows of the tourism season. This cyclical pattern is a key indicator of when the city experiences its highest influx of visitors.

Moreover, by looking at the graph year-over-year, we can observe whether tourism in Boston is growing, remaining stable, or facing a decline. Such insights are crucial for stakeholders in the tourism industry, as they provide a clear view of the most popular times for tourists in the city. This information can be invaluable for planning purposes, whether it’s for staffing needs, marketing campaigns, or resource management. In essence, the passenger numbers at Logan Airport offer a reliable barometer for understanding and anticipating the dynamics of tourism in Boston.

NOVEMBER 13, 2023

 

I began examining information from Analyze Boston, the open data repository for the City of Boston. This is a remnant dataset of economic indicators from the Boston Planning and Development Authority (BPDA), which was in charge of organizing and directing inclusive growth in the City of Boston, that were recorded monthly between January 2013 and December 2019. A wide range of economic data on employment, housing, travel, and real estate development are gathered and analyzed by BPDA. I was able to correctly eliminate all null value.

November 10,2023

The decision tree divides the data into branches that helped to reveal patterns and relationships. It can show whether individuals of a particular race in a particular age group are more or less likely to engage in police shootings. The decision tree lets us see how different types of shootings by police have occurred. The results of using a decision tree provides a clear and descriptive way to understand the complex relationships in our given data. It will also help us identify specific situations where police shootings are more common in certain age groups and ethnic groups. This can be a powerful tool for uncovering biases or trends that may not be immediately obvious by simply looking at raw data.

October 27,2023

DBSCAN (Density-Based Spatial Clustering of Applications with Noise): This algorithm forms clusters based on the density of data points. It groups points that are closely packed together and marks points in low-density regions as outliers. DBSCAN can find clusters of any shape and is good at separating high-density clusters from noise in the data.

October 25, 2023

K-means: This method groups data into ‘k’ number of clusters by reducing the distance between data points and the center of their assigned cluster. The ‘means’ in the name refers to averaging the data points to find the center of the cluster. It’s good for spherical-shaped clusters.

K-medoids: Similar to K-means, but instead of using the mean, it uses actual data points as the center of the cluster, known as medoids. This method is more robust to noise and outliers compared to K-means because medoids are less influenced by extreme values.

October 23, 2023

Effect Size: Using effect size is similar to measuring the robustness or plausibility of an observed pattern. It tells us not only that there is a difference, but how big that difference is. Using effect sizes for age differences between racial groups in police shootings will not only tell us that there are significant differences, but also allow us to understand how large those differences are in terms of benefits. This helps us understand the true significance of the findings, rather than simply recognizing that the differences are not due to chance.

October 16, 2023

Statistical Significance: It will helps us determine whether findings from data, as the ages of different groups of people killed by police, may be accurate or could have just happened. If something is statistically significant it means that we are pretty sure that the patterns we see in the data (such as one group being younger or older than another) are real, not just a random so if we use statistical significance then age differences between groups are real.

October 13,2023

If we want to see how age and race might affect certain outcomes, such as police contacts, it is inefficient to do a bunch of subtests for each category.  ANOVA is a technique that allows us to look at all the puzzles at once. It is a single, simplified test that can tell us whether age, race, or a mixture of both has an effect giving us a clear answer.

November 8,2023

Means:This refers to the average age of people in each racial group who were killed by police. If we consider all the averages to see if there’s a significant difference between racial groups. We might find that one group tends to be older or younger on average when they are killed by police. Knowing the average ages can help us understand broader patterns and possibly identify factors that might contribute to these tragic events.

November 6,2023

Variance tells us how spread out the ages are in each racial group of people killed by police. If everyone were the same age, variance would be zero because there’s no spread. But in real life, ages vary, so we get a number that tells us how much they differ from the average age. For different racial groups have different variances in ages. This is important because when we use methods like ANOVA (Analysis of Variance), we assume that these variances are roughly the same across groups. But that’s not the case here, which means we might need to use different statistical methods that don’t make this assumption to accurately understand the data.

November 3, 2023

ANOVA and t-tests are used to obtain p-values, which are then used to calculate the chance of observing a significant mean difference. From a Bayesian standpoint, this is so-called frequentist method of determining a large difference in means is considered fundamentally incorrect, particularly in some cases it is evident that the null hypothesis is false. However, a Bayesian approach to inference for the age and race data appears to be doomed mainly because, although the data exist, we do not know of any previous distributions, have any idea how those prior distributions might look, or have any idea in what sense the age and race data is evidence of change.

November 1, 2023

The average age of Hispanics group is 32 years old, meaning that half of the population is younger and the other half older. The mean age of 33.59 years is more over the average, suggesting some skewness in the age distributions. The average age of this group is indicated by the standard deviation, which is 10.74 years. The distribution’s skewness, which measures 0.803, indicates a moderate right skew with a greater proportion of younger than elderly people. The age distribution for Hispanics is significantly higher and has more tail, as indicated by the kurtosis value of 3.725, which is higher than the kurtosis of a normal distribution.

The Native American group shows an even age distribution with a mean age of 32 years. This group has a slightly larger age group than the Hispanic group, with a median age of 8.949 years. Even though it is less than the Hispanic distribution, a skewness value of 0.565 indicates a minor skewness to the right. Since the age distribution has a somewhat smaller tail and a less prominent peak than the normal distribution, the kurtosis value of 2.883 suggests that age is undervalued.