Econometrics: coding on RSTUDIO - US Presidential Elections

Oboulo Int.

Order the writing of a tailor-made Computer science Research papers

Free quote online

Tutorials/exercises Format .docx

Econometrics: coding on RSTUDIO - US Presidential Elections

Download

Read an extract

Themes

Econometrics, coding, RSTUDIO, presidential elections, USA United States of America, data analysis

Reader
Abstract
Contents
Extract

Abstract

This coding exercise consists of one long analysis of data from the 2020 Presidential elections in the United States.

Loading the Data
Understanding what the dataset contains
Filtering the data and exploring further
Checking quality of the data
Some more tidying
Maps
Tidying covariate data
Regression analysis

Get this table of contents for free after login.

Extract

[...] Econometrics: coding on RSTUDIO - US Presidential Elections Loading the Data 1*. elections read.csv("/path/to/countypres_2000-2020.csv") Understanding what the dataset contains 1. tail(elections, 10) 2. table_years table(elections$year) table_mode table(elections$mode) from the years and 2020. The "mode" variable contains various types of voting methods or election modes represents the election results for a specific candidate in a specific county, for a given year and mode of voting number_of_variables ncol(elections) 12 variables number_of_rows nrow(elections) 72,617 rows. Filtering the data and exploring further 1*. elections_2020 filter(elections, year 2020) 2. [...]

[...] reg2: R² is likely higher, indicating additional variance explained by including average income and manufacturing share. reg3: Adds education level; R² shows variance explained by unemployment, income, manufacturing, and education. reg4: Includes demographic characteristics; likely has the highest R², showing variance explained by all included economic, educational, and demographic variables The change in the coefficient for pct_unemp_rate from reg1 to reg2 suggests the following: Significant Change: Indicates that mean_income_thousands and pct_manufac confound the relationship between unemployment rate and Trump's vote share. [...]

[...] candidate_names table(elections_2020$candidate) 2*. elections_2020_clean elections_2020 mutate( candidatevotes = ifelse(is.na(candidatevotes) candidatevotes), totalvotes_county = ifelse( is.na(totalvotes_county), sum(candidatevotes, na.rm = TRUE), totalvotes_county ) ) 3*. elections_2020_clean elections_2020_clean group_by(state, state_po, county_name, county_fips, totalvotes, candidate) summarise(candidatevotes_sum = sum(candidatevotes, na.rm = TRUE)) mutate(pct_votes = (candidatevotes_sum / totalvotes) * 100) group_by(county_fips) mutate(candidate_rank = rank(-candidatevotes_sum)) ungroup() 4. invalid_pct_votes filter(elections_2020_clean, pct_votes 100) invalid_count nrow(invalid_pct_votes) Maps 2. load("path/to/county_shfl.RData") fleming_fips_elections elections_2020_clean filter(toupper(county_name) "FLEMING") select(county_fips) unique() fleming_fips_shfl county_shfl filter(toupper(NAME) "FLEMING") select(STATEFP, COUNTYFP, GEOID) FIPS code for "FLEMING" county is typeof(elections_2020_clean$county_fips) typeof(county_shfl$GEOID) float 4. [...]

[...] In each regression model, the coefficient for pct_unemp_rate reflects the expected change in Trump's vote share for each one-percentage-point change in the unemployment rate, with varying levels of control for other factors: reg1: Shows change without controlling for other factors. reg2: Adjusts for economic variables like average income and manufacturing share. reg3: Further includes control for the level of higher education. reg4: Additionally accounts for demographic characteristics like race, age, gender, and county size In reg4, each coefficient shows how Trump's vote share percentage is expected to change with a unit change in each variable, controlling for other factors: pct_unemp_rate, mean_income_thousands, pct_manufac, pct_bachelor, pct_afam,pct_age_above_62,pct_female and county_size_thousands 10. [...]

[...] acs_2019_covariates acs_2019_covariates mutate(county_size_thousands = (n_hhs * avgsize_hhs) / 1000, mean_income_thousands = mean_income / 1000) 3 acs_2019_covariates acs_2019_covariates select(NAME, GEO_ID, pct_unemp_rate, mean_income_thousands, pct_manufac, pct_bachelor, pct_afam, pct_age_above_62, pct_female, county_size_thousands) 4. geo_id_fleming_county acs_2019_covariates filter(NAME "Fleming County, Kentucky") pull(GEO_ID) the value is '0500000US21069' 5*. acs_2019_covariates acs_2019_covariates mutate(county_fips_clean = substr(GEO_ID 6. elections_2020_covariates left_join(elections_2020_clean, acs_2019_covariates, by = "common_column_name") Regression analysis 3. elections_2020_reg_trump elections_2020_covariates filter(candidate "Donald Trump") select(state, state_po, county_name, county_fips_clean, pct_votes, pct_unemp_rate, mean_income_thousands, pct_manufac, pct_bachelor, pct_afam, pct_age_above_62, pct_female, county_size_thousands) rename(pct_votes_trump = pct_votes) 2. load("pre_cleaned_dataset_for_regression_analysis/elections_2020_reg_trump.RData") skim(elections_2020_reg_trump, pct_votes_trump, pct_unemp_rate, mean_income_thousands, pct_manufac, pct_bachelor, pct_afam, pct_age_above_62, pct_female, county_size_thousands) 3. [...]

docx