Friday, July 27, 2012

How to assemble the Voter ID Database

Here is a technical step by step for people who would like to assemble this data and do their own analysis.
  
This document is a step by step description of steps taken to join the voter files with the "no id" and
"expired id" files and produce the crosstabs.

The voter registration file is maintained by the City of Philadelphia and is a public document. We requested it from City Commissioner Stephanie Singer's office. Requestors must certify that it will not be used for commercial purposes like mailing lists. 

The "no id" and "expired id" files were provided by the Pennsylvania Secretary of State to city and county registrars last week. The contain the state voter number for everyone on the list and can be linked to the voter registration file.  These are also public documents and can be requested from the Secretary of State. 

This was done in Visual FoxPro 6.0 but these commands
would work in any SQL database with simple adjustments. I would encourage/beg others to create the database do the analysis for themselves.


Create voter table for Philadelphia from delimited text file.
create table philavoters free ;
(ID C (12),;
Pref C (4),;
Last C (25),;
First C (25),;
Middle C (15),;
Suffix C (5),;
Sex C (1),;
DOB D,;
Dreg D,;
Status C (1),;
Changedt D,;
Party C (20),;
House I ,;
HouseNo C (4),;
Street C (50),;
Apt C (15),;
Addr_2 C (30),;
City C (35),;
State C (2),;
Zip C (15),;
MAdd_1 C (10),;
MAdd_2 C (10),;
MCity C (20),;
MState C (2),;
MZip C (15),;
Pollp C (50),;
Pollpl2 C (128),;
PollCSZ C (128),;
Lastvote D (8),;
Dist1 C (15),;
Dist2 C (15),;
Dist3 C (15),;
Dist4 C (15),;
Dist5 C (15),;
Dist6 C (15),;
Dist7 C (15),;
Dist8 C (15),;
Dist9 C (15),;
Dist10 C (15),;
Dist11 C (15),;
Dist12 C (15),;
Dist13 C (15),;
Dist14 C (15),;
Dist15 C (15),;
Custom1 C (25),;
Lastchange D (8),;
PR042412 C (10),;
PR042412VM C (10),;
GN110811 C (10),;
GN110811VM C (10),;
PR051711 C (10),;
PR051711VM C (10),;
SP020111 C (10),;
SP020111VM C (10),;
GN110210 C (10),;
GN110210VM C (10),;
PR051810 C (10),;
PR051810VM C (10),;
GN110309 C (10),;
GN110309VM C (10),;
PR051909 C (10),;
PR051909VM C (10),;
GN110408 C (10),;
GN110408VM C (10),;
PR042208 C (10),;
PR042208VM C (10))
append from voters2.txt type delimited with character "|"

1025829 records created. This represents all voters, active and inactive.

Add an index on id
Modify philavoters table to add the following fields.
Noid char(1) - flag for voters in the state's "no id" file
Expired char(1) - flag for voters in the "expired id" file.
Age integer - age calculated by subtracting Date of Birth from today's date.
Age range char(8) - range to be calculated from Age field.



Create noid table from the state's "no id" data release.

create table no_id free ;
(ID C (12),;
Las C (25),;
Fir C (25),;
Mid C (15),;
DOB C (8))
append from no_voter_id_per_state.csv type delimited

186830 records created. This represents all records in the "no id" file for Philadelphia, active
and inactive.

Add an index on id.


Create table from "expired" file from the state.
The file I received contained only a list of state voter ids.

create table expire free ;
(ID C (12))
append from expire.txt type delimited

574631 Records created. (this is more than just Philly voters; it is presumably a statewide file)

Add an index on id



Tables created, begin analysis.

Overall count of voters

Select count(*) from philavoters
Result: 1,025,829

Count of "active" voters:

Select count(*) from philavoters where status='A'
Result: 868,674

Query for records in the Philly voter file with IDs that are contained in the "no_id" file
select count(*) from Philavoters where id in (select id from no_id)
Result: 186,560 ...this is the number of ALL philly voters in the "no id" file.

Same query, this time limiting for voters on "active" status. This means voters who have voted in last X
years.

select count(*) from Philavoters where status='A' and id in (select id from no_id)
Result: 135,859

Query for records in the Philly voter file with IDs that are contained in the "expired" file

select count(*) from Philavoters where id in (select id from expire)
result: 175,769
select count(*) from Philavoters where status='A' and id in (select id from expire)
result: 146,750


Now, is there any overlap between the "no id" and the "expired" file?

select count(*) from expire where id in (select id from no_id)
Result: 0
Just for good measure, reverse it:
select count(*) from no_id where id in (select id from expire)
result: 0

So - there is no overlap between "no id" and "expired" sets from the state. They are different
subsets of voters.


Now, put the data into the main table using the fields we created:
update Philavoters set noid='1' where id in (select id from no_id)
update Philavoters set expired='1' where id in (select id from expire)

How many voters are either no id or expired?
select count(*) from Philavoters where Noid='1' or expired='1'
Result: 362,329
How many of these are active?
select count(*) from Philavoters where (Noid='1' or expired='1') and Status='A'
Result: 282,609


How many are both (just to double check our data)?
select count(*) from Philavoters where Noid='1' and expired='1'
Result: 0

For convenience, populate field "either" which represents voter is either "no id" or "expired"
THIS IS THE GROUP THAT THE STATE HAS IDENTIFIED WILL NOT BE ABLE TO VOTE
WITHOUT GETTING AN ID
update philavoters set either='1' where Noid='1' or expired='1'
362,369 records updated.


Populate age field by calculating from DOB (this will vary by database)
update philavoters set Age=(date()-dob)/365.25

Populate age range field
update philavoters set age_range="und_25" where age<25
update philavoters set age_range="25-34" where age<35 and age >=25
update philavoters set age_range="35-44" where age<45 and age >=35
update philavoters set age_range="45-49" where age<50 and age >=45
update philavoters set age_range="50-54" where age<55 and age >=50
update philavoters set age_range="55-59" where age<60 and age >=55
update philavoters set age_range="60-64" where age<65 and age >=60
update philavoters set age_range="65-69" where age<70 and age >=65
update philavoters set age_range="70-74" where age<75 and age >=70
update philavoters set age_range="75-79" where age<80 and age >=75
update philavoters set age_range="80-84" where age<85 and age >=80
update philavoters set age_range="85-" where age >=85


Cross-tabulate age range with "either" (all voters)
SELECT Philavoters.age_range, Philavoters.either, count(*);
 FROM philavoters;
 GROUP BY Philavoters.age_range, Philavoters.either;
 ORDER BY Philavoters.age_range, Philavoters.either;
 INTO CURSOR SYS(2015)
 DO (_GENXTAB) WITH 'Query1'


Note: c_1 in this table means the voters have ID problems of one kind or another. Blank means
they have a valid ID in the state licensing database according to Secretary of State.

age_range
ID problem
 Has id
und_25
52151
46052
25-34
85609
158700
35-44
58156
121515
45-49
30643
58415
50-54
31264
58705
55-59
26077
57193
60-64
19876
48445
65-69
15217
36635
70-74
11861
25664
75-79
9889
19806
80-84
8565
15542
85-
13021
16828



Same crosstab but including only "active" voters.

SELECT Philavoters.age_range, Philavoters.either, count(*);
 FROM philavoters;
 WHERE Philavoters.status = "A";
 GROUP BY Philavoters.age_range, Philavoters.either;
 ORDER BY Philavoters.age_range, Philavoters.either;
 INTO CURSOR SYS(2015)
 DO (_GENXTAB) WITH 'Query1'

Note: c_1 in this table means the voters have ID problems of one kind or another. Blank
means they have a valid ID in the state licensing database according to Secretary of State.
Includes only "active" voters


age_range
id problem
has id
und_25
48623
44087
25-34
57389
134129
35-44
40714
102937
45-49
23402
51109
50-54
24923
52595
55-59
21708
52043
60-64
16768
44695
65-69
13106
33835
70-74
10286
23842
75-79
8634
18399
80-84
7328
14278
85-
9728
14116


No comments:

Post a Comment