************************************************.
** SPSS sample syntax file for the chapter: .
** Lambert, P.S. "Advances in data management for social survey research",
** in R. Procter and P. Halfpenny
** (Eds) 'Innovations in Digital Research Methods', London: Sage.
** Background: .
** This file is referred to in 'Table 1' of the above chapter.
** It calls upon a data file which is available to download from the book's website.
** This example was originally written by Paul Lambert for SPSS v19
** (it is expected to be consistent across all recent SPSS versions).
************************************************.
************************************************.
** Instructions.
** Run the active lines (or groups of lines) in sequence, reflecting on the results
** as they emerge on the 'outputs' window ('active' lines are those
** which don't begin with the comment symbol '*'). Lines are run by highlighting
** all or part of the text of the line, and clicking 'run -> selection' (or 'ctrl'+r).
** For more extended information on processing commands through SPSS syntax files,
** see the materials for the DAMES project workshop on 'Documentation and workflows
** in social survey research', at http://www.dames.org.uk/workshops/ .
************************************************.
*** Precursors: .
** Declaration of the location of the accompanying data file.
* (edit the text below to a suitable location for the file on your own machine).
define !datafile2 () "H:\csdp\data\anon_survey_data_orig.sav" !enddefine.
* Declaration of the location of the file 'gb91soc90', which is an 'index file' linking occupational
* units with derived measures about them (also avialable from GEODE, www.geode.stir.ac.uk).
* (this file is actually in Stata format, but it can be read in SPSS).
define !occfile1 () "H:\csdp\data\gb91soc90.dta" !enddefine.
** Declaration of a location on your computer where temporary copies of data can be saved:.
define !tempdir () "H:\temp\" !enddefine.
************************************************.
************************************************.
************************************************.
************************************************.
*** Open the data file and review it:.
get file=!datafile2.
descriptives var=all.
************************************************.
************************************************.
*** Data management (1):
** Review of the data and construction of variables and cases
** to suit regression analysis .
** Analysis scenario: we have an outcome measure of self-reported time spent on housework,
** and we're interested on the joint relative influences upon in from a small range of
** socio-economic and socio-demographic measured factors .
** Outcome variable in its natural units.
fre var= hrhswrk.
select if hrhswrk >= 0 & hrhswrk <= 98. /* restrict analysis to those reporting hours of housework between 0 and 98 */
graph /histogram=hrhswrk .
* Consider taking square root of the outcome to downplay the influence of small numbers of highest values.
compute hrhswrk2=sqrt(hrhswrk).
variable label hrhswrk2 "Square root of hours spent per week on housework". /* adding metadata */
graph /histogram=hrhswrk2 .
* Consider taking a dichotomisation of outcome variable to support a clear substantive interpretation: .
compute hihswrk=(hrhswrk >= 20).
variable label hihswrk "Number of hours of housework per week". /* metadata on the variable label */
add value labels hihswrk 0 "Less than 20 hours" 1 "20 or more hours per week" . /* attaches metadata in the form of value labels */
fre var=hihswrk.
** Explanatory variables.
* Some applications require a single constant variable to be explicitly generated:.
compute cons=1.
fre var=cons.
* Self-rated health and age in years: .
fre var=health age. /* These are probably ok to use in their default linear functional forms */
* Educational level: .
fre var=quals .
* As this variable is complex, recode it into four measures then generate dummy variables for them .
compute educ4=quals.
recode educ4 (1 2 = 1) (3 4 5 = 2) (6 7 8 10 11 = 3) (9 12=4) (else=-9).
/* 'else=-9' means all other values coded to -9, a code that can be used to indicate missing values */
add value labels educ4 1 "Degree" 2 "Diploma level" 3 "Higher School level"
4 "Low school level or below" . /* this makes a new value label item, called educ4l, in Stata's memory */
missing values educ4 (-9). /* states that the -9 is to be treated as missing */
fre var=educ4.
compute educ4_1 = (educ4=1).
compute educ4_2 = (educ4=2).
compute educ4_3 = (educ4=3).
compute educ4_4 = (educ4=4).
fre var=educ4_1 to educ4_4.
* Marital status: .
fre var=mastat. /* For this variable, reduce it down to a two-category constrast */
compute cohab=mastat.
recode cohab (1 2 7=1) (3 thru 6=0) (else=-9).
missing values cohab (-9).
cro tables=mastat by cohab.
* Employment status: .
fre var=jobstat . /* For this variable, reduce it down to a two-category constrast */
compute work=jobstat.
recode work (1 2 3=1) (4 thru 10=0) (else=-9).
missing values work (-9).
cro tables=jobstat by work .
* Gender:.
fre var=sex.
compute fem=(sex=2) ./* Dummy variable for being female */
cro table= fem by sex .
descriptives var= hrhswrk hrhswrk2 hihswrk fem cohab age health educ4_1 to educ4_4 work .
/* these are variables for all the measures of interest, in suitable functional forms */
** Listwise deletion of missing data: .
compute missvars=nmiss(hrhswrk2, fem, cohab, age, health, educ4_1, educ4_2 , educ4_3 , educ4_4, work).
fre var=missvars. /* shows number of missing values across these variables */
select if (missvars=0). /* Deletes cases which have missing values on any of varaibles */
descriptives var= hrhswrk hrhswrk2 hihswrk fem cohab age health educ4 work ./* Analysis now restricted to 1237 valid cases */
****************************************.
*** Preparation: Construct gender interation terms with other variables,
* since descriptive analysis, and substantive results, suggests they might be relevant: .
compute femcoh=fem*cohab.
compute femage=fem*age.
compute fem_educ1=fem*educ4_1.
compute fem_educ2=fem*educ4_2.
compute fem_educ3=fem*educ4_3.
compute fem_educ4=fem*educ4_4.
descriptives var=all.
*****************.
** For information, the file now in memory is is also the data file used in chapter 7 .
*****************.
****************************************.
*** A few examples of analysis with the current dataset : .
** Bivariate and descriptive results.
correlate var= hrhswrk hrhswrk2 age health.
graph /bar = mean(hrhswrk) by educ4 by sex.
graph /bar = mean(hrhswrk2) by educ4 by sex.
sort cases by sex.
split files by sex.
graph /bar = mean(hrhswrk) mean(hrhswrk2) by educ4.
split files off.
means tables=hrhswrk hrhswrk2 by sex by cohab /cells=mean semean count .
sort cases by sex.
split files by sex.
means tables=hrhswrk hrhswrk2 by cohab /cells=mean semean count /statistics=anova.
split files off.
cro table=hihswrk by work cohab by sex /cells=count col /statistics=chisq phi .
** Running three linear regression models and comparing their results: .
* 1).
regression /var= hrhswrk2 fem age cohab health /dep=hrhswrk2 /method=enter.
* 2).
regression /var= hrhswrk2 fem age cohab health educ4_1 educ4_2 educ4_4 work
/dep=hrhswrk2 /method=enter.
*3.
regression /var= hrhswrk2 fem age cohab health educ4_1 educ4_2 educ4_4 work
femcoh fem_educ1 fem_educ2 fem_educ4
/dep=hrhswrk2 /method=enter.
*************************************************.
*************************************************.
************************************************.
************************************************.
*** Data management preparation (2).
*** Matching external data .
** We have some longer instructions on match-merging datasets at www.dames.org.uk/workshops.
** Here is a brief illustrative example.
** The current dataset in memory.
descriptives var=all.
* This happens to have a measure, 'mrjsoc' that gives SOC90 occupational unit group codes for the
* most recent job of the respondents.
fre var=mrjsoc .
** Task: We would like to attach some occupation-based measures to the appropraite soc90 codes
* (a common data management requirement, and one central to the DAMES node provisions).
** An external file (also available online) that gives occupational codes and derived measures is
** defined at the top if this file as '!occfile1'.
** This file has 'soc90' as a variable in soc90 units
** Actually, it also has values of employment status in variable 'ukempst', for which the value zero means 'unkown'.
** Here is a sequence of lines that will attach the measures 'mcamsis' and 'fcamsis' from the 'occfile1'
** file to the current dataset.
sav out = !tempdir+"m1.sav". /* saves a copy of the current dataset to another location */
get stata file=!occfile1 .
descriptives var=all.
rename variables soc90= mrjsoc. /* change index file name to allow matching */
sort cases by mrjsoc ukempst. /* sort in order of match-merge variables - a requirement for efficient merging */
sav out=!tempdir+"m2.sav" /keep mrjsoc ukempst mcamsis fcamsis.
/* save a temporary copy of the index file to be available for merging, keeping only 4 variables in the dataset */
get file=!tempdir+"m1.sav". /* reopens original file */
descriptives var=all. /* 1237 cases in this data */
compute ukempst=0 . /* a new variable to indicate unkown employment status */
sort cases by mrjsoc ukempst .
match files file=* /table=!tempdir+"m2.sav" /by=mrjsoc ukempst . /* merge 'many to one' with the other temporary file according to the index variables mrjsoc and ukempst */
descriptives var=all. /* we now have two new variables on the file, mcamsis and fcamsis */
** Example: Analysis using the new variables (mcamsis = occupational advantage score).
* 1).
regression /var= hrhswrk2 fem age cohab health /dep=hrhswrk2 /method=enter.
* 2).
regression /var= hrhswrk2 fem age cohab health educ4_1 educ4_2 educ4_4 work
/dep=hrhswrk2 /method=enter.
*3.
regression /var= hrhswrk2 fem age cohab health educ4_1 educ4_2 educ4_4 work
femcoh fem_educ1 fem_educ2 fem_educ4
/dep=hrhswrk2 /method=enter.
*4).
descriptives var=mcamsis.
compute femcam=fem*mcamsis .
regression /var= hrhswrk2 fem age cohab health educ4_1 educ4_2 educ4_4 work mcamsis
femcoh fem_educ1 fem_educ2 fem_educ4 femcam
/dep=hrhswrk2 /method=enter.
*************************************************.
*************************************************.
** Lastly, save the file in its current form: .
sav out=!tempdir+"anon_survey_data_2.sav".
descriptives var=all. /* i.e., this is a new data file saved in the temporary folder, available for further analysis */
** .
*************************************************.
*************************************************.