The Perfect Climate Index, Take I

What is the ideal climate? After having recently experienced the silky sensual feeling of low day-night temperature difference (here), I’m adding it together with a bunch of usual suspects – high T max, long sunshine hours, litle precipitation – to form a composite measure of good weather: the Perfect Climate Index (PCI).

I’m testing the PCI on Switzerland, a country not exactly known as a weather paradise. All the more important to find the occasional dry and warm spot!

As you will see later, the PCI really sucks as a general (i.e. worldwide) measure of good climate, but it works for Switzerland. There is a challenge to make it globally applicable, and I will outline some ideas for how to do that. I was too lazy to do it myself this time, but who knows, maybe you’ll get inspired! Meanwhile, there is enough challenge in visualizing the data from over a hundred weather stations in my little home country. You can follow along, as I post the data and code for this example (see ‘Materials’ at the end).

The data from IDAweb cover the years 1968-2013, but I decided to use only the years since 1979, partly for compatibility with my previous post, but also because earlier data is sparse (see missing data plot below).

For each measure, IDAweb gives you a separate ASCII file, with lines corresponding to days. Here’s part of the T max data:

MeteoSchweiz/MeteoSuisse/MeteoSvizzera/MeteoSwiss

stn;time;tre200dx
TAE;19710101;-8.3
TAE;19710102;-7.4
TAE;19710103;-8.0
TAE;19710104;-5.7
TAE;19710105;-6.3

KLO;20131230;2.9
KLO;20131231;-0.3
KLO;20140101;7.1

 

You also get a list of station names, their three-letter codes, longitudes, latitudes and altitudes.

 

Plotting missing data

First, I want to get an idea of the data coverage. You can see that the T max file starts with jan 01, 1971, and not 1968. This is generally the case: the time series only include years for with there are at least some non-missing values. This leads to a situation of “hidden missings”, because years with all missings don’t show up anywhere.

To avoid this sort of thing, I’m reading the data into a prefab, initially empty file containing the full temporal grid of values from jan 01, 1968 to jan 01, 2014 for all stations. The file contains N stations x N years x N days = 2’184’390 observations – the largest data set I’ve ever dealt with (shame on me)!

To tackle the missings, let’s create a missing indicator (e.g mtmax) and then sum up the number of missings per station (nmisstmax).

// missing values
    foreach var of varlist rainfall sunshine tmin tmax {
        gen m`var' = date if mi(`var')
        format %9.0g m`var'
        label var `var' "non-missing values indicate that corresponding var is missing"
    }

    // collapsing over N of missings
    collapse ///
    (count) nmissrainfall    =mrainfall         ///
    (count) nmisssunshine    =msunshine         ///
    (count) nmisstmax        =mtmax             ///
    (count) nmisstmin        =mtmin             ///
    , by(stn year)
    encode stn, gen(stnn)

 

The variable containing the N missings is created by the -collapse- command, which, at the same time, collapses the data over stations. The resulting file has only one record per station containing its N missings-variable. In other words, we have gotten rid of all unnecessary baggage by reducing 2 million observations to 130! This data set can now be used for graphing the missings.

zmapv12_nmisstmax_grid

 

It might not look like much, but this graph was a killer! It took me hours of tinkering and it’s still not perfect; but it taught me quite a bit about graphing.

What it shows is easy enough to understand: the number of missings by station and year. The red lines indicate years with more than 360 days of data missing, which turns out in all cases to mean that the whole year is missing. The orange lines denote years with over 50% days missing. I didn’t add more categories since they made the display look crowded and most of these years had no missings, anyway.

You can see what I mentioned above: the data is sparse for the years up to about 1980, when ANETZ, a network of automated stations, became operational, and then there’s another improvement happening in the early 90s, for which I don’t know the reason.

Second, there are very few years with 180-360 days missing, indicating again that it’s either all or nothing: by and large, the data is missing by whole years or not at all.

What’s tricky about this graph? Although it looks like some sort of line graph, it is actually a scatter plot! And the red lines are not made of graphing symbols but of the ASCII underscore character stored in a special variable. So what’s graphed are not markers but marker labels, and here’s where the difficulties begin. You cannot control the plotting position of marker labels very finely, and that makes it hard to align them with the  grid lines. In fact, after positioning the labels as well as I could, I shifted the grid lines up a tiny bit to make them meet the labels. This, in turn, caused a slight displacement of the y-axis labels relative to the grid lines. At this point, no amount of tinkering would help to fix this.

Look at the graph code (note that all my graphs use my own LF3 graph scheme, which you can get in the ‘Materials’ section of my first post):

	gen nothing=""
    cap drop yr
    gen yr=year-1.25    
    gen marker="{superscript:__}"    

    foreach var of varlist nmissrainfall nmisssunshine nmisstmax nmisstmin {                
            zmapv12 `var' stnn yr if yr<2014, break(5 180 360)                                         /// =
              ms(none ..) mlabel(nothing nothing marker marker) mlabcolor(gs14 gs14                 ///
              orange cranberry) mlabp(4 ..)                                                         ///
              yaxis(1 2) ysc(on lc(white) axis(1)) ysc(on lc(white) axis(2))                         ///
              yla(1(2)120, val labsize(*.29) labgap(half_tiny) noticks axis(1))                     ///
              ymt(1.25(2)120.25, noticks axis(1) grid glw(vvvthin) glc(gs13) glpattern(solid))        ///
              yla(2(2)120, val labsize(*.29) labgap(half_tiny) noticks  axis(2))                     ///
              ymt(2.25(2)120.25, noticks axis(2) grid glw(vvvthin) glc(gs15) glpattern(solid))        ///
              xsc(on) xti(" ") xla(1970(5)2010, labsize(vsmall)) xmt(1968(1)2013)                     ///
              note(" ") ti("Missing values: `var'", size(medsmall))                                 ///
              text(120 1968 "-", color(orange) size(vsmall) placement(top) yaxis(1) margin(bottom) justification(left))         ///
              text(120 1971.5 "180-360", color(gs2) size(vsmall) placement(top) yaxis(1) margin(bottom) justification(left))    ///
              text(120 1976 "-", color(cranberry) size(vsmall) placement(top) yaxis(1) margin(bottom) justification(left))         ///
              text(120 1978.5 "360+", color(gs2) size(vsmall) placement(top) yaxis(1) margin(bottom) justification(left))         ///
              legend(off) plotregion(margin(0 1 2 10)) graphregion(margin(b-5)) xsize(3.0) aspectratio(0.765)                     ///
              name(zmapv12_`var'_grid, replace)
              graph export zmapv12_`var'_grid.png, name(zmapv12_`var'_grid) replace
    }

 

The string variable ‘marker’ is generated to hold the plotting character. The strange curly braces you can see inside the double quotes contain a Stata SCML directive to output the superscript of the character. I did this merely because it turned out that the superscripted version could be more precisely aligned with the grid lines.

The character is positioned using the -mlabposition()- option. The argument is the plotting position given as a clock-time. Thus, the value I chose, 4, corresponds to the position of the number 4 on a clock-face that is centered at the position of the data point (where usually the graphing symbol would be placed). That’s as precise as you can get.

Grid lines in Stata originate from tick marks, of which there are major and minor ones. To shift the grid up you can simply reposition the corresponding tick marks (minor ones, in this case) by a tiny fraction. That’s what -ymt(1.25(2)120.25, noticks grid- does: it adds 0.25 to the previous position of the ticks. Note that display of the ticks themselves is suppressed by the -noticks- option. Now the station names that label the major tick marks are slightly offset with respect to the grid lines. You might think you could fix this problem in the same way, by shifting the major ticks upward. However, you would then lose the station labels, because they are attached to the particular values of the ticks. If you shift the ticks up by placing them at higher values, those new values won’t be associated with labels anymore, and the station names will disappear. Now – why not just adjust the value label of the variable holding the station names? – Because value labels can contain only integers, no reals! And that’s why this story ends there.

The fact that the plot uses no graphing symbols means there can be no legend. At least none using Stata’s -legend- option. The legend you see is constructed using the -text- command that allows you to place text on a graph. I didn’t fully figure out its inner workings, but if you sit down for an hour with the help file, I’m sure you can. I found the text positions by trial-and-error.

Finally, the graph’s scatter plot nature is hidden behind -zmap-, a user-written program by seasoned Stata expert Nick Cox, which I adapted to my purposes. It lets you plot categories of a z-variable by y and x, and color-code them differently, by layering seperate scatter plots for each category over each other. [UPDATE: Brendan Halpin has suggested an alternative, easier way to construct this graph. See first comment below and my reply -13.05.14]

 

The Perfect Climate Index

When you get data from somewhere, it is usually messy. Not only can it be poorly documented and labeled, there can also be weird values and missing data codes that you need to straighten out. I’ve done this for the present data, but I’m much too unfamiliar with weather data to know what kinds of values for which measures are implausible. That’s one of the reasons I chose to use the median to construct the PCI, because it is not so dependent on outliers as the mean.

After averaging the single weather measures for each calendar day across the 35 years from 1979-2013, I construct the PCI as:

PCI = T max – (difference T max – T min) + 0.5*(sunshine hours) + log(rain fall [l/m2] + 1)

My definition of a perfect climate means it’s hot, the difference between day and night temperature is small, more sunshine is better, and so is less rain. I weighted sunshine duration by 0.5 so that two more hours correspond to the effect of one more degree Celsius. Rain fall has a large spread of values that I compressed using a log-transformation.

	// daily averages over years 1979 - 2013
	bysort station dayy: egen dtmax7913 = median(tmax)
	bysort station dayy: egen drain7913 = median(rainfall)
        ...

	// PCI 7913 daily, averaged over years
	bysort station dayy: gen dpci7913 = dtmax7913 - dtdiff7913 + .5*dsun7913 - log(drain7913 + 1)
	bysort station year: egen ydpci7913 = mean(dpci7913)

 

So where are Switzerland’s dry and sunny spots? And where is it wet and cold? Let’s look at the top and bottom 10 contenders:

xtl_dpci7913hilo_d_nomiss_u1000

 

Lugano and Locarno, both located south of the Alpes, come out on top, followed by some places in the western, French-speaking part of Switzerland (called “La Romandie”). The bottom 10 are spread throughout the German-speaking part, where, unfortunately, I myself live. By the way, Lugano and vicinity get quite a bit of rain, but sunshine, high T’s and low temperature differences drive up the PCI.

The code:

glabsort stationn dpci7913, gen(stationndpci7913) lab(stationndpci7913) sort(-)
...

	xtline dpci7913top dpci7913bottom, 											/// =
		recast(spike) i(stationndpci7913hilo) t(dayy) 							///
		lc("scheme p2" "scheme p1") lw(vvvthin) subtitle(, size(small))			///
		ytitle(" ", size(small)) ylabel(-10(10)20, labsize(small) 				///
		grid glwidth(vvvthin) glcolor(gs14) nogmin nogextend)   		 		///
		tti(" ") tsc(range(0 370)) tla(1 91 181 273 365, labs(small) val) 		///
		legend(order(1 "Top 10" 2 "Bottom 10"))									///
		byopts(																	///
		  ti("Daily Perfect Climate Index (PCI) for Switzerland, 1979-2013", 	///
		  size(medsmall)) subtitle(, size(small))  						 		///
	  	  note("Panels sorted by descending daily PCI" 					///
	  	  "Stations with >50% missings and altitudes >1000 m excluded", 		///
	  	  size(vsmall))													 		///
		  plotregion(margin(top)) rows(4)					 					///
		)																		///
		name(xtl_dpci7913hilo_d_nomiss_u1000, replace)

 

The code again uses -glabsort- to sort panels by PCI (see my first post). The graph is a panel-data plot (-xtline-), recast as -twoway spike-, because I like the display of the values as heights as opposed to lines. In contrast to temperature values, however, the zero line is arbitrary and has no special meaning, which militates against using spikes. Decide for yourself.

Let’s look at the distribution of the PCI on a map (I have excluded locations above 1000m altitude, because otherwise they would obviously occupy the lowest ranks):

CHmap_ydpci7913_bluered7_label

 

I know you expected nothing less than a chloropleth, but I don’t think I have enough data points to warrant such a display. Well, maybe one day I’ll do it anyway. In the meantime, this plot is also quite pretty. High PCIs are represented both by more saturated red hues as well as larger circles. Locations in the two highest caregories of PCI are additionally labelled. I had to play around with circle sizes, color schemes, background colors and labeling a while to find a combination that was not too busy and gave acceptable contrasts. I know there is some over-printing, but that’s fine for present purposes.

The color scheme is put together from two sequential color brewer schemes (colorbrewer2.org). Color values are given in RGB. In fact, Stata accepts not only its own named colors, but also RGB, CMYK, and HSV notation.

// mostly red color scheme, RGB values
	loc col "`"158 202 225"' `"222 235 247"' `"254 229 217"'`"252 146 114"'`"251 106 74"'`"203 24 29"'`"153 0 13"'"
	zmapv12 ydpci7913 lat lon, break(0 2.5 5 7.5 10 12.5) 					/// =
			ms(O ..) msize(medsmall medium medlarge large vlarge huge)		///
			mlabel(no no no no st st st) 									///
			mcolor(`col') 													///
			mlabc(`col')													///
			mlabs(*.55 ..) mlabv(labp) mlabangle(45 ..)						///
			ti("The Perfect Climate Index (PCI) for Switzerland, 1979-2013", ///
			size(medsmall))	note(" ") 										///
			legend(order(2 3 4 5 6 7) rows(1)) ysize(3.3) aspect()			///
			addplot(tw area _Y _X, cmissing(n) nodropbase color(gs14))		///
			name(CHmap_ydpci7913_bluered7_label, replace)

 

Not surprisingly, the PCI distribution confirms the results of the analysis in my previous post: Lugano & co. are the place to be when one is looking for some mediterranean mildness in Switzerland. Thank God for the Ticino! one is tempted to say.

The PCI, however, sucks as a global measure of pleasant climate. The main reason is its use of temperature – the hotter, the better – but this logic must break down at some point. Certainly, not even the most ardent sun worshipper will thrive at 45°C in the shade, a scorching sun, no rain, and not a single degree of cooling down in the night. Not only should T max be encoded with an optimum, e.g. 30°, but the day-night difference should also depend on it: the higher T max gets, the more nightly cooling is acceptable.

In an improved version, I’d also like to include wind speed and humidity. Personally, I hate anything that’s stronger than a mild summer breeze, and I love humid air. I would penalize wind speed relatively linearly, but perhaps set an optimum for humidity at 80%. Setting optima for variables presents an interesting challenge for mathematical implementation. You need a function that decreases (or increases) on both sides of the maximum. One way to achieve this is by using periodic functions such as the sine and cosine. But this is a task for another day – unless one of you beats me to it!

 


Materials

The data from IDAweb is copyright-protected. In order to let you reproduce this example I have added a small amount of random noise to the data. Please be aware that this renders the data factually invalid. Drop the data into a directory, point Stata to it, and run the example code below to produce all the figures. Download the data here.

    // Swiss weather data from IDAWEB (Meteo Swiss)
    // 4 measures from 1970-2013 from max. 123 stations
    
    cd ... // set your directory to the one where your data files are
    
    // read data files (.txt)
    clear
    local filelist : dir . files "*_data*"
    foreach file of local filelist {
                
        tempfile temp
        file open w using `temp', write append
        file open r using "`file'", read
        file read r line
        
        tokenize "`line'", parse(";")
        while "`1'" != "stn" {
            file read r line
            tokenize "`line'", parse(";")
        }
        // macro 5 contains var name/measure
        loc v `5'
        while r(eof)==0 {
            file write w "`line'" _n
            file read r line
        }        
        file close r
        file close w
        
        insheet using `temp', delimiter(";") clear names
        // save as Stata .dta file
        save "`v'", replace            

    }        

    
    local filelist2 : dir . files "*.dta"
    foreach file of local filelist2 {
        use `file'
        isid(stn time)
        di _n "`file'" _n
        *codebook stn
        gen date = date(string(time, "%10.0g"), "YMD")
        format %td date
        save "`file'", replace
    }            
    

    // make master merge file with full grid of N stations x N years x n days
    clear
    local files rka150d0 su2000d0 tre200dn tre200dx
    loc i = 1
    foreach file of local files {
        use `file'
        tempfile stnfile`i'
        sort stn
        encode stn, gen(stnn)
        collapse (firstnm) stnf=stnn, by(stn) // for some reason 'stnn' is not allowed as by-var
        drop stnf
        save `stnfile`i++''
    }
    loc --i            
    
    tempfile allstnfile
    use `stnfile1'
    forv j=2/`i' {
        merge 1:1 stn using `stnfile`j++'', nogen
    }
    save allstnfile, replace
    
    use allstnfile, replace
    encode stn, gen(stnn)
    label values stnn .
    loc nstations = _N    // N stations = 130
    drop stnn    // drops var, but keeps value label
    
    // N days 1.1.1968 - 1.1.2014
    loc date1 = date("19680101", "YMD")
    loc date2 = date("20140101", "YMD")
    loc ndays = `date2' - `date1' + 1  // N days = 16803 days
    loc nobs = `ndays' * `nstations'     
    
    set obs `nobs'
    egen date=seq(), from(`date1') to(`date2') block(`nstations') // produces 01jan1968 01jan1968 01jan1968 ...
    format %td date
    sort date stn
    egen stnn=seq(), t(`nstations') // 'from' defaults to 1
    label value stnn stnn
    bysort stnn (date): replace stn = stn[1]    
    save mastergrid, replace
    
    
    local files rka150d0 su2000d0 tre200dn tre200dx
    foreach file of local files {
        merge 1:1 stn date using `file', nogen
    }            
    drop time        
    save idaweb, replace
    order date,a(stnn)
    
    
    // dictionaries of station and measures
    ren rka150d0 rainfall
    ren su2000d0 sunshine
    ren tre200dx tmax
    ren tre200dn tmin
    
    label var rainfall "rainfall [mm], daily sum"
    label var sunshine "sunshine hours"
    label var tmax "temperature max [°C]"
    label var tmin "temperature min [°C]"

    // full station names
    clonevar station=stn
    replace station="Adelboden" if station=="ABO"
    replace station="Oberägeri" if station=="AEG"
    replace station="Aigle" if station=="AIG"
    replace station="Altdorf" if station=="ALT"
    replace station="Amsoldingen" if station=="AMS"
    replace station="Andeer" if station=="AND"
    replace station="Altenrhein" if station=="ARH"
    replace station="Les Attelas" if station=="ATT"
    replace station="Bargen" if station=="BAR"
    replace station="Basel / Binningen" if station=="BAS"
    replace station="Bern / Zollikofen" if station=="BER"
    replace station="Beznau" if station=="BEZ"
    replace station="Bière" if station=="BIE"
    replace station="Bischofszell" if station=="BIZ"
    replace station="Boltigen" if station=="BOL"
    replace station="Bouveret" if station=="BOU"
    replace station="La Brévine" if station=="BRL"
    replace station="Brienz" if station=="BRZ"
    replace station="Buffalora" if station=="BUF"
    replace station="Buchs / Aarau" if station=="BUS"
    replace station="La Chaux-de-Fonds" if station=="CDF"
    replace station="Cevio" if station=="CEV"
    replace station="Nyon / Changins" if station=="CGI"
    replace station="Chasseral" if station=="CHA"
    replace station="Château-d'Oex" if station=="CHD"
    replace station="Le Chenit" if station=="CHE"
    replace station="Chur" if station=="CHU"
    replace station="Cham" if station=="CHZ"
    replace station="Cimetta" if station=="CIM"
    replace station="Crap Masegn" if station=="CMA"
    replace station="Acquarossa / Comprovasco" if station=="COM"
    replace station="Piz Corvatsch" if station=="COV"
    replace station="Cressier" if station=="CRM"
    replace station="Davos" if station=="DAV"
    replace station="Les Diablerets" if station=="DIA"
    replace station="Disentis / Sedrun" if station=="DIS"
    replace station="La Dôle" if station=="DOL"
    replace station="Ebnat-Kappel" if station=="EBK"
    replace station="Eggishorn" if station=="EGH"
    replace station="Egolzwil" if station=="EGO"
    replace station="Einsiedeln" if station=="EIN"
    replace station="Elm" if station=="ELM"
    replace station="Engelberg" if station=="ENG"
    replace station="Evionnaz" if station=="EVI"
    replace station="Evolène / Villa" if station=="EVO"
    replace station="Fahy" if station=="FAH"
    replace station="Bullet / La Frétaz" if station=="FRE"
    replace station="Monte Generoso" if station=="GEN"
    replace station="Giswil" if station=="GIH"
    replace station="Glarus" if station=="GLA"
    replace station="Gösgen" if station=="GOE"
    replace station="Gornergrat" if station=="GOR"
    replace station="Fribourg / Posieux" if station=="GRA"
    replace station="Grenchen" if station=="GRE"
    replace station="Grimsel Hospiz" if station=="GRH"
    replace station="Col du Grand St-Bernard" if station=="GSB"
    replace station="Gütsch ob Andermatt" if station=="GUE"
    replace station="Güttingen" if station=="GUT"
    replace station="Genève-Cointrin" if station=="GVE"
    replace station="Salen-Reutenen" if station=="HAI"
    replace station="Hallau" if station=="HLL"
    replace station="Hörnli" if station=="HOE"
    replace station="Interlaken" if station=="INT"
    replace station="Jungfraujoch" if station=="JUN"
    replace station="Zürich / Kloten" if station=="KLO"
    replace station="Koppigen" if station=="KOP"
    replace station="Lägern" if station=="LAE"
    replace station="Langnau i.E." if station=="LAG"
    replace station="Leibstadt" if station=="LEI"
    replace station="Lodrino" if station=="LOR"
    replace station="Lugano" if station=="LUG"
    replace station="Luzern" if station=="LUZ"
    replace station="Männlichen" if station=="MAE"
    replace station="Magadino / Cadenazzo" if station=="MAG"
    replace station="Mathod" if station=="MAH"
    replace station="Meiringen" if station=="MER"
    replace station="Le Moléson" if station=="MLS"
    replace station="Mosen" if station=="MOA"
    replace station="Möhlin" if station=="MOE"
    replace station="Matro" if station=="MTR"
    replace station="Mühleberg" if station=="MUB"
    replace station="Montana" if station=="MVE"
    replace station="Napf" if station=="NAP"
    replace station="Naluns / Schlivera" if station=="NAS"
    replace station="Neuchâtel" if station=="NEU"
    replace station="Oron" if station=="ORO"
    replace station="Locarno / Monti" if station=="OTL"
    replace station="Payerne" if station=="PAY"
    replace station="Pilatus" if station=="PIL"
    replace station="Piotta" if station=="PIO"
    replace station="Plaffeien" if station=="PLF"
    replace station="Piz Martegnas" if station=="PMA"
    replace station="St-Prex" if station=="PRE"
    replace station="Pully" if station=="PUY"
    replace station="Quinten" if station=="QUI"
    replace station="Bad Ragaz" if station=="RAG"
    replace station="Zürich / Affoltern" if station=="REH"
    replace station="Poschiavo / Robbia" if station=="ROB"
    replace station="Robièi" if station=="ROE"
    replace station="Rünenberg" if station=="RUE"
    replace station="Säntis" if station=="SAE"
    replace station="Samedan" if station=="SAM"
    replace station="S. Bernardino" if station=="SBE"
    replace station="Stabio" if station=="SBO"
    replace station="Schmerikon" if station=="SCM"
    replace station="Scuol" if station=="SCU"
    replace station="Schaffhausen" if station=="SHA"
    replace station="Sion" if station=="SIO"
    replace station="Zürich / Fluntern" if station=="SMA"
    replace station="Sta. Maria, Val Müstair" if station=="SMM"
    replace station="Schüpfheim" if station=="SPF"
    replace station="St. Gallen" if station=="STG"
    replace station="Steckborn" if station=="STK"
    replace station="Aadorf / Tänikon" if station=="TAE"
    replace station="Titlis" if station=="TIT"
    replace station="Ulrichen" if station=="ULR"
    replace station="Valbella" if station=="VAB"
    replace station="Vaduz" if station=="VAD"
    replace station="Visp" if station=="VIS"
    replace station="Wädenswil" if station=="WAE"
    replace station="Weissfluhjoch" if station=="WFJ"
    replace station="Wynau" if station=="WYN"
    replace station="Zermatt" if station=="ZER"

    replace station="Weesen" if station=="WEE"
    replace station="Sargans" if station=="SAR"
    replace station="St. Chrischona" if station=="STC"
    replace station="Uetliberg" if station=="UEB"
     replace station="Mühleberg / Stockeren" if station=="MSK"
    replace station="Interlaken" if station=="INT"
    replace station="Bantiger" if station=="BAN"

    order station,b(stn)

    save idaweb, replace

    // checking for funny values
    foreach var of varlist rainfall sunshine tmin tmax {
            di _n as error "`var':" _n
            levelsof `var', clean
    }                        

    // another check for funny values
    foreach var of varlist rainfall sunshine tmin tmax {
            hilo `var'                     
    }                        
    
    foreach var of varlist rainfall sunshine tmin tmax {
            hilo `var', high show(20)
    }                        

    foreach var of varlist rainfall sunshine tmin tmax {
            hilo `var', low show(20)
    }                        

    // many weird highest values --> use median instead of mean
    
    
    
    // correcting funny values
    
    // 1 mm of rainfall = 1 l/m2
    // I'm really not sure how to correct high rainfall values, so I leave it alone
    
    replace sunshine=. if sunshine>16
    replace tmax=. if tmax>50
    replace tmin=. if tmin>50 | tmin < -50    // tmin contains the gargantuan value 340282346638.0 🙂
    
    // make tdiff
    gen tdiff = tmax-tmin
    label var tdiff "Diff T max - T min [°C]"    
    label var stnn "numeric version of stn"
    order tdiff,a(tmax)
    
    // date components
    gen year = year(date)
    gen dayy = doy(date)
    label var dayy "day in year"
    order year month day dayy, after(date)

    save idaweb, replace

    // missing values
    foreach var of varlist rainfall sunshine tmin tmax {
        gen m`var' = date if mi(`var')
        format %9.0g m`var'
        label var `var' "non-missing values indicate that corresponding var is missing"
    }
    
    save idaweb2, replace

    // collapsing over N of missings
    collapse ///
    (count) nmissrainfall    =mrainfall         ///
    (count) nmisssunshine    =msunshine         ///
    (count) nmisstmax        =mtmax             ///
    (count) nmisstmin        =mtmin             ///
    , by(stn year)
    encode stn, gen(stnn)

    // plot missings
    // this plots took hours of tinkering and are likely to look
    // somewhat different on your screen

    gen nothing=""
    cap drop yr
    gen yr=year-1.25    
    gen marker="{superscript:__}"    

    foreach var of varlist nmissrainfall nmisssunshine nmisstmax nmisstmin {                
            zmapv12 `var' stnn yr if yr<2014, break(5 180 360)                                         /// =
              ms(none ..) mlabel(nothing nothing marker marker) mlabcolor(gs14 gs14                 ///
              orange cranberry) mlabp(4 ..)                                                         ///
              yaxis(1 2) ysc(on lc(white) axis(1)) ysc(on lc(white) axis(2))                         ///
              yla(1(2)120, val labsize(*.29) labgap(half_tiny) noticks axis(1))                     ///
              ymt(1.25(2)120.25, noticks axis(1) grid glw(vvvthin) glc(gs13) glpattern(solid))        ///
              yla(2(2)120, val labsize(*.29) labgap(half_tiny) noticks  axis(2))                     ///
              ymt(2.25(2)120.25, noticks axis(2) grid glw(vvvthin) glc(gs15) glpattern(solid))        ///
              xsc(on) xti(" ") xla(1970(5)2010, labsize(vsmall)) xmt(1968(1)2013)                     ///
              note(" ") ti("Missing values: `var'", size(medsmall))                                 ///
              text(120 1968 "-", color(orange) size(vsmall) placement(top) yaxis(1) margin(bottom) justification(left))         ///
              text(120 1971.5 "180-360", color(gs2) size(vsmall) placement(top) yaxis(1) margin(bottom) justification(left))    ///
              text(120 1976 "-", color(cranberry) size(vsmall) placement(top) yaxis(1) margin(bottom) justification(left))         ///
              text(120 1978.5 "360+", color(gs2) size(vsmall) placement(top) yaxis(1) margin(bottom) justification(left))         ///
              legend(off) plotregion(margin(0 1 2 10)) graphregion(margin(b-5)) xsize(3.0) aspectratio(0.765)                     ///
              name(zmapv12_`var'_grid, replace)
              graph export zmapv12_`var'_grid.png, name(zmapv12_`var'_grid) replace
    }

    // map of stations
    
    use idaweb, clear
    gen str lonlat=""
    replace lonlat ="7°34'/46°30'" if station=="Adelboden"
    replace lonlat ="8°36'/47°08'" if station=="Oberägeri"
    replace lonlat ="6°55'/46°20'" if station=="Aigle"
    replace lonlat ="8°37'/46°53'" if station=="Altdorf"
    replace lonlat ="7°35'/46°44'" if station=="Amsoldingen"
    replace lonlat ="9°26'/46°37'" if station=="Andeer"
    replace lonlat ="9°34'/47°29'" if station=="Altenrhein"
    replace lonlat ="7°16'/46°06'" if station=="Les Attelas"
    replace lonlat ="8°36'/47°48'" if station=="Bargen"
    replace lonlat ="7°35'/47°32'" if station=="Basel / Binningen"
    replace lonlat ="7°28'/46°59'" if station=="Bern / Zollikofen"
    replace lonlat ="8°14'/47°33'" if station=="Beznau"
    replace lonlat ="6°21'/46°31'" if station=="Bière"
    replace lonlat ="9°14'/47°30'" if station=="Bischofszell"
    replace lonlat ="7°23'/46°37'" if station=="Boltigen"
    replace lonlat ="6°51'/46°24'" if station=="Bouveret"
    replace lonlat ="6°37'/46°59'" if station=="La Brévine"
    replace lonlat ="8°04'/46°44'" if station=="Brienz"
    replace lonlat ="10°16'/46°39'" if station=="Buffalora"
    replace lonlat ="8°05'/47°23'" if station=="Buchs / Aarau"
    replace lonlat ="6°48'/47°05'" if station=="La Chaux-de-Fonds"
    replace lonlat ="8°36'/46°19'" if station=="Cevio"
    replace lonlat ="6°14'/46°24'" if station=="Nyon / Changins"
    replace lonlat ="7°03'/47°08'" if station=="Chasseral"
    replace lonlat ="7°08'/46°29'" if station=="Château-d'Oex"
    replace lonlat ="6°13'/46°36'" if station=="Le Chenit"
    replace lonlat ="9°32'/46°52'" if station=="Chur"
    replace lonlat ="8°28'/47°11'" if station=="Cham"
    replace lonlat ="8°47'/46°12'" if station=="Cimetta"
    replace lonlat ="9°11'/46°51'" if station=="Crap Masegn"
    replace lonlat ="8°56'/46°28'" if station=="Acquarossa / Comprovasco"
    replace lonlat ="9°49'/46°25'" if station=="Piz Corvatsch"
    replace lonlat ="7°04'/47°03'" if station=="Cressier"
    replace lonlat ="9°51'/46°49'" if station=="Davos"
    replace lonlat ="7°12'/46°20'" if station=="Les Diablerets"
    replace lonlat ="8°51'/46°42'" if station=="Disentis / Sedrun"
    replace lonlat ="6°06'/46°25'" if station=="La Dôle"
    replace lonlat ="9°07'/47°16'" if station=="Ebnat-Kappel"
    replace lonlat ="8°06'/46°26'" if station=="Eggishorn"
    replace lonlat ="8°00'/47°11'" if station=="Egolzwil"
    replace lonlat ="8°45'/47°08'" if station=="Einsiedeln"
    replace lonlat ="9°11'/46°55'" if station=="Elm"
    replace lonlat ="8°25'/46°49'" if station=="Engelberg"
    replace lonlat ="7°02'/46°11'" if station=="Evionnaz"
    replace lonlat ="7°31'/46°07'" if station=="Evolène / Villa"
    replace lonlat ="6°56'/47°25'" if station=="Fahy"
    replace lonlat ="6°35'/46°50'" if station=="Bullet / La Frétaz"
    replace lonlat ="9°01'/45°56'" if station=="Monte Generoso"
    replace lonlat ="8°11'/46°51'" if station=="Giswil"
    replace lonlat ="9°04'/47°02'" if station=="Glarus"
    replace lonlat ="7°58'/47°22'" if station=="Gösgen"
    replace lonlat ="7°47'/45°59'" if station=="Gornergrat"
    replace lonlat ="7°07'/46°46'" if station=="Fribourg / Posieux"
    replace lonlat ="7°25'/47°11'" if station=="Grenchen"
    replace lonlat ="8°20'/46°34'" if station=="Grimsel Hospiz"
    replace lonlat ="7°10'/45°52'" if station=="Col du Grand St-Bernard"
    replace lonlat ="8°37'/46°39'" if station=="Gütsch ob Andermatt"
    replace lonlat ="9°17'/47°36'" if station=="Güttingen"
    replace lonlat ="6°08'/46°15'" if station=="Genève-Cointrin"
    replace lonlat ="9°01'/47°39'" if station=="Salen-Reutenen"
    replace lonlat ="8°28'/47°42'" if station=="Hallau"
    replace lonlat ="8°56'/47°22'" if station=="Hörnli"
    replace lonlat ="7°52'/46°40'" if station=="Interlaken"
    replace lonlat ="7°59'/46°33'" if station=="Jungfraujoch"
    replace lonlat ="8°32'/47°29'" if station=="Zürich / Kloten"
    replace lonlat ="7°36'/47°07'" if station=="Koppigen"
    replace lonlat ="8°24'/47°29'" if station=="Lägern"
    replace lonlat ="7°48'/46°56'" if station=="Langnau i.E."
    replace lonlat ="8°11'/47°36'" if station=="Leibstadt"
    replace lonlat ="8°59'/46°18'" if station=="Lodrino"
    replace lonlat ="8°58'/46°00'" if station=="Lugano"
    replace lonlat ="8°18'/47°02'" if station=="Luzern"
    replace lonlat ="7°56'/46°37'" if station=="Männlichen"
    replace lonlat ="8°53'/46°10'" if station=="Magadino / Cadenazzo"
    replace lonlat ="6°34'/46°44'" if station=="Mathod"
    replace lonlat ="8°10'/46°44'" if station=="Meiringen"
    replace lonlat ="7°01'/46°33'" if station=="Le Moléson"
    replace lonlat ="8°14'/47°15'" if station=="Mosen"
    replace lonlat ="7°53'/47°34'" if station=="Möhlin"
    replace lonlat ="8°55'/46°25'" if station=="Matro"
    replace lonlat ="7°17'/46°58'" if station=="Mühleberg"
    replace lonlat ="7°28'/46°18'" if station=="Montana"
    replace lonlat ="7°56'/47°00'" if station=="Napf"
    replace lonlat ="10°16'/46°49'" if station=="Naluns / Schlivera"
    replace lonlat ="6°57'/47°00'" if station=="Neuchâtel"
    replace lonlat ="6°51'/46°34'" if station=="Oron"
    replace lonlat ="8°47'/46°10'" if station=="Locarno / Monti"
    replace lonlat ="6°57'/46°49'" if station=="Payerne"
    replace lonlat ="8°15'/46°59'" if station=="Pilatus"
    replace lonlat ="8°41'/46°31'" if station=="Piotta"
    replace lonlat ="7°16'/46°45'" if station=="Plaffeien"
    replace lonlat ="9°32'/46°35'" if station=="Piz Martegnas"
    replace lonlat ="8°14'/47°32'" if station=="PSI"    
    replace lonlat ="6°27'/46°29'" if station=="St-Prex"
    replace lonlat ="6°40'/46°31'" if station=="Pully"
    replace lonlat ="9°13'/47°08'" if station=="Quinten"
    replace lonlat ="9°30'/47°01'" if station=="Bad Ragaz"
    replace lonlat ="8°31'/47°26'" if station=="Zürich / Affoltern"
    replace lonlat ="10°04'/46°21'" if station=="Poschiavo / Robbia"
    replace lonlat ="8°31'/46°27'" if station=="Robièi"
    replace lonlat ="7°53'/47°26'" if station=="Rünenberg"
    replace lonlat ="9°21'/47°15'" if station=="Säntis"
    replace lonlat ="9°53'/46°32'" if station=="Samedan"
    replace lonlat ="9°11'/46°28'" if station=="S. Bernardino"
    replace lonlat ="8°56'/45°51'" if station=="Stabio"
    replace lonlat ="8°56'/47°13'" if station=="Schmerikon"
    replace lonlat ="10°17'/46°48'" if station=="Scuol"
    replace lonlat ="8°37'/47°41'" if station=="Schaffhausen"
    replace lonlat ="7°20'/46°13'" if station=="Sion"
    replace lonlat ="8°34'/47°23'" if station=="Zürich / Fluntern"
    replace lonlat ="10°26'/46°36'" if station=="Sta. Maria, Val Müstair"
    replace lonlat ="8°01'/46°57'" if station=="Schüpfheim"
    replace lonlat ="9°24'/47°26'" if station=="St. Gallen"
    replace lonlat ="8°59'/47°40'" if station=="Steckborn"
    replace lonlat ="8°54'/47°29'" if station=="Aadorf / Tänikon"
    replace lonlat ="8°26'/46°46'" if station=="Titlis"
    replace lonlat ="8°18'/46°30'" if station=="Ulrichen"
    replace lonlat ="9°33'/46°45'" if station=="Valbella"
    replace lonlat ="9°31'/47°08'" if station=="Vaduz"
    replace lonlat ="7°51'/46°18'" if station=="Visp"
    replace lonlat ="8°41'/47°13'" if station=="Wädenswil"
    replace lonlat ="9°48'/46°50'" if station=="Weissfluhjoch"
    replace lonlat ="7°47'/47°15'" if station=="Wynau"
    replace lonlat ="7°45'/46°02'" if station=="Zermatt"
    replace lonlat ="9°05'/47°08'" if station=="Weesen"
    replace lonlat ="9°26'/47°03'" if station=="Sargans"
    replace lonlat ="7°41'/47°34'" if station=="St. Chrischona"
    replace lonlat ="8°29'/47°21'" if station=="Uetliberg"
    replace lonlat ="7°17'/46°57'" if station=="Mühleberg / Stockeren"
    replace lonlat ="7°52'/46°40'" if station=="Interlaken"
    replace lonlat ="7°32'/46°59'" if station=="Bantiger"

    gen slon  = substr(lonlat,1,strpos(lonlat,"/")-1)
    gen slat  = substr(lonlat,strpos(lonlat,"/")+1,.)
    gen lon =                                             ///    
    real(substr(slon,1,strpos(slon,"°")-1)) +            ///
    real(substr(slon,-3,2))/60
    gen lat =                                             ///    
    real(substr(slat,1,strpos(slat,"°")-1)) +            ///
    real(substr(slat,-3,2))/60
    drop slon slat
    label var lon "longitude [decimal]"
    label var lat "latitude [decimal]"    
    
    save idaweb, replace

    ///////////////////////////////////////////////////////
    // the PCI: Perfect Climate Index: only years 1979-2013
    ///////////////////////////////////////////////////////
    
        
    drop if year<1979 | year==2014
    save idaweb7913, replace
    
    cap label drop dayy
    label define dayy 1 " " 91 "Apr" 181 "Jul" 273 "Oct" 365 " " 366 " "
    label value dayy dayy
    encode station, gen(stationn)
    order stationn,a(station)
    

    // daily averages over years 1979 - 2013

    bysort station dayy: egen drain7913 = median(rainfall)             
    bysort station dayy: egen dsun7913 = median(sun)                 
    bysort station dayy: egen dtmin7913 = median(tmin)                 
    bysort station dayy: egen dtmax7913 = median(tmax)                 
    bysort station dayy: egen dtdiff7913 = median(tdiff)             
             
    
    // PCI 7913 daily, averaged over years
    bysort station dayy: gen dpci7913 = dtmax7913 - dtdiff7913 + .5*dsun7913 - log(drain7913+1)    
    bysort station year: egen ydpci7913 = mean(dpci7913)        

        // missing "indicators" and count for dpci7913
    gen mdpci7913  = date if mi(dpci7913)
    format %9.0g mdpci7913
    bysort station dayy: egen nmissdpci7913 = count(mdpci7913)
    codebook year     // -> 35 years
    gen pmissdpci7913 = 100*nmissdpci7913/35
    label var nmissdpci7913 "Daily N missings per station, across 1979-2013"
    label var pmissdpci7913 "% Daily N missings per station, across 1979-2013"

    bysort station: egen totnmissdpci7913 = count(mdpci7913)
    gen tot50nmissdpci7913 = totnmissdpci7913>12784/2 & !mi(totnmissdpci7913)
    label var tot50nmissdpci7913 ">50% years with missing values on dpci7913"

    // entering altitude values needed for next graph
    gen altitude = .
    replace altitude = 1320 if station=="Adelboden"
    replace altitude = 724 if station=="Oberägeri"
    replace altitude = 381 if station=="Aigle"
    replace altitude = 438 if station=="Altdorf"
    replace altitude = 638 if station=="Amsoldingen"
    replace altitude = 987 if station=="Andeer"
    replace altitude = 398 if station=="Altenrhein"
    replace altitude = 2730 if station=="Les Attelas"
    replace altitude = 740 if station=="Bargen"
    replace altitude = 316 if station=="Basel / Binningen"
    replace altitude = 552 if station=="Bern / Zollikofen"
    replace altitude = 325 if station=="Beznau"
    replace altitude = 683 if station=="Bière"
    replace altitude = 470 if station=="Bischofszell"
    replace altitude = 820 if station=="Boltigen"
    replace altitude = 374 if station=="Bouveret"
    replace altitude = 1050 if station=="La Brévine"
    replace altitude = 567 if station=="Brienz"
    replace altitude = 1968 if station=="Buffalora"
    replace altitude = 386 if station=="Buchs / Aarau"
    replace altitude = 1018 if station=="La Chaux-de-Fonds"
    replace altitude = 417 if station=="Cevio"
    replace altitude = 455 if station=="Nyon / Changins"
    replace altitude = 1599 if station=="Chasseral"
    replace altitude = 1029 if station=="Château-d'Oex"
    replace altitude = 1015 if station=="Le Chenit"
    replace altitude = 556 if station=="Chur"
    replace altitude = 440 if station=="Cham"
    replace altitude = 1661 if station=="Cimetta"
    replace altitude = 2480 if station=="Crap Masegn"
    replace altitude = 575 if station=="Acquarossa / Comprovasco"
    replace altitude = 3302 if station=="Piz Corvatsch"
    replace altitude = 431 if station=="Cressier"
    replace altitude = 1594 if station=="Davos"
    replace altitude = 2964 if station=="Les Diablerets"
    replace altitude = 1197 if station=="Disentis / Sedrun"
    replace altitude = 1669 if station=="La Dôle"
    replace altitude = 623 if station=="Ebnat-Kappel"
    replace altitude = 2893 if station=="Eggishorn"
    replace altitude = 521 if station=="Egolzwil"
    replace altitude = 910 if station=="Einsiedeln"
    replace altitude = 958 if station=="Elm"
    replace altitude = 1035 if station=="Engelberg"
    replace altitude = 482 if station=="Evionnaz"
    replace altitude = 1825 if station=="Evolène / Villa"
    replace altitude = 596 if station=="Fahy"
    replace altitude = 1205 if station=="Bullet / La Frétaz"
    replace altitude = 1600 if station=="Monte Generoso"
    replace altitude = 475 if station=="Giswil"
    replace altitude = 516 if station=="Glarus"
    replace altitude = 380 if station=="Gösgen"
    replace altitude = 3129 if station=="Gornergrat"
    replace altitude = 646 if station=="Fribourg / Posieux"
    replace altitude = 430 if station=="Grenchen"
    replace altitude = 1980 if station=="Grimsel Hospiz"
    replace altitude = 2472 if station=="Col du Grand St-Bernard"
    replace altitude = 2283 if station=="Gütsch ob Andermatt"
    replace altitude = 440 if station=="Güttingen"
    replace altitude = 420 if station=="Genève-Cointrin"
    replace altitude = 718 if station=="Salen-Reutenen"
    replace altitude = 419 if station=="Hallau"
    replace altitude = 1144 if station=="Hörnli"
    replace altitude = 577 if station=="Interlaken"
    replace altitude = 3580 if station=="Jungfraujoch"
    replace altitude = 426 if station=="Zürich / Kloten"
    replace altitude = 484 if station=="Koppigen"
    replace altitude = 845 if station=="Lägern"
    replace altitude = 745 if station=="Langnau i.E."
    replace altitude = 341 if station=="Leibstadt"
    replace altitude = 261 if station=="Lodrino"
    replace altitude = 273 if station=="Lugano"
    replace altitude = 454 if station=="Luzern"
    replace altitude = 2230 if station=="Männlichen"
    replace altitude = 197 if station=="Magadino / Cadenazzo"
    replace altitude = 437 if station=="Mathod"
    replace altitude = 588 if station=="Meiringen"
    replace altitude = 1974 if station=="Le Moléson"
    replace altitude = 452 if station=="Mosen"
    replace altitude = 344 if station=="Möhlin"
    replace altitude = 2171 if station=="Matro"
    replace altitude = 479 if station=="Mühleberg"
    replace altitude = 1427 if station=="Montana"
    replace altitude = 1403 if station=="Napf"
    replace altitude = 2400 if station=="Naluns / Schlivera"
    replace altitude = 485 if station=="Neuchâtel"
    replace altitude = 827 if station=="Oron"
    replace altitude = 366 if station=="Locarno / Monti"
    replace altitude = 490 if station=="Payerne"
    replace altitude = 2106 if station=="Pilatus"
    replace altitude = 990 if station=="Piotta"
    replace altitude = 1042 if station=="Plaffeien"
    replace altitude = 2670 if station=="Piz Martegnas"
    replace altitude = 334 if station=="PSI"
    replace altitude = 425 if station=="St-Prex"
    replace altitude = 455 if station=="Pully"
    replace altitude = 419 if station=="Quinten"
    replace altitude = 496 if station=="Bad Ragaz"
    replace altitude = 443 if station=="Zürich / Affoltern"
    replace altitude = 1078 if station=="Poschiavo / Robbia"
    replace altitude = 1896 if station=="Robièi"
    replace altitude = 611 if station=="Rünenberg"
    replace altitude = 2502 if station=="Säntis"
    replace altitude = 1708 if station=="Samedan"
    replace altitude = 1638 if station=="S. Bernardino"
    replace altitude = 353 if station=="Stabio"
    replace altitude = 408 if station=="Schmerikon"
    replace altitude = 1303 if station=="Scuol"
    replace altitude = 438 if station=="Schaffhausen"
    replace altitude = 482 if station=="Sion"
    replace altitude = 555 if station=="Zürich / Fluntern"
    replace altitude = 1383 if station=="Sta. Maria, Val Müstair"
    replace altitude = 742 if station=="Schüpfheim"
    replace altitude = 775 if station=="St. Gallen"
    replace altitude = 398 if station=="Steckborn"
    replace altitude = 539 if station=="Aadorf / Tänikon"
    replace altitude = 3040 if station=="Titlis"
    replace altitude = 1345 if station=="Ulrichen"
    replace altitude = 1569 if station=="Valbella"
    replace altitude = 457 if station=="Vaduz"
    replace altitude = 639 if station=="Visp"
    replace altitude = 485 if station=="Wädenswil"
    replace altitude = 2691 if station=="Weissfluhjoch"
    replace altitude = 422 if station=="Wynau"
    replace altitude = 1638 if station=="Zermatt"
    replace altitude = 425 if station=="Weesen"
    replace altitude = 487 if station=="Sargans"
    replace altitude = 493 if station=="St. Chrischona"
    replace altitude = 854 if station=="Uetliberg"
    replace altitude = 775 if station=="Mühleberg / Stockeren"
    replace altitude = 577 if station=="Interlaken"
    replace altitude = 941 if station=="Bantiger"
    order alt,a(lat)    
        

    // top & botom 10 with altitude<1000 and <50% missings

    glabsort stationn dpci7913, gen(stationndpci7913) lab(stationndpci7913) sort(-)

    ta stationndpci7913 if alt<1000 & !tot50nmissdpci7913, nol
    clonevar stationndpci7913hilou1000 = stationndpci7913 if !tot50nmissdpci7913
    replace stationndpci7913hilou1000 = . if alt>=1000 & stationndpci7913>10
    ta stationndpci7913hilou1000, nol
    replace stationndpci7913hilou1000 = . if stationndpci7913>10 & stationndpci7913<57

    save idaweb7913, replace

    // collapse data by station and day of year
    
    collapse dpci7913 stationndpci7913hilou1000, by(stationn dayy)
    label values stationndpci7913hilou1000 stationndpci7913

    // color top and bottom 10 panels differently    
    gen dpci7913top       = dpci7913 if stationndpci7913hilou1000 <=10
    gen dpci7913bottom = dpci7913 if stationndpci7913hilou1000 >=57

    xtline dpci7913top dpci7913bottom,                                             /// =
        recast(spike) i(stationndpci7913hilo) t(dayy)                             ///
        lc("scheme p2" "scheme p1") lw(vvvthin) subtitle(, size(small))            ///
        ytitle(" ", size(small)) ylabel(-10(10)20, labsize(small)                 ///
        grid glwidth(vvvthin) glcolor(gs14) nogmin nogextend)                    ///
        tti(" ") tsc(range(0 370)) tla(1 91 181 273 365, labs(small) val)         ///
        legend(order(1 "Top 10" 2 "Bottom 10"))                                    ///        
        byopts(                                                                    ///
          ti("Daily Perfect Climate Index (PCI) for Switzerland, 1979-2013",     ///
          size(medsmall)) subtitle(, size(small))                                   ///
            note("Panels sorted by descending daily PCI"                     ///
            "Stations with >50% missings and altitudes >1000 m excluded",         ///
            size(vsmall))                                                             ///
          plotregion(margin(top)) rows(4)                                         ///
        )                                                                        ///  
        name(xtl_dpci7913hilo_d_nomiss_u1000, replace)

    /////////////////////////////////////
    //   show color-coded PCI on map   //
    /////////////////////////////////////
    
    use idaweb7913, clear
    
    // the method here is based on this blog post:
    // http://mas802.wordpress.com/2013/10/03/overlay-longitude-latitude-data-on-a-map-with-stata/
    // get shapefile for Switzerland, e.g. here: http://biogeo.ucdavis.edu/data/gadm2/shp/CHE_adm.zip
    // install program to convert shapefiles to Stata data files

    *ssc install shp2dta
    shp2dta using CHE_adm0.shp, database(chdb) coordinates(chcoord) genid(id)
    append using chcoord
        
    keep station lon lat ydpci7913
    sort station
    collapse (first) lon lat ydpci7913, by(station)
    append using chcoord
    
    cap drop labp
    gen labp = 3 if !mi(station)
    replace labp = 1 in 68    // Lodrino 1
    replace labp = 1 in 83    // Naluns / Schlivera 1    
    replace labp = 1 in 112    // St. Gallen 1
    replace labp = 5 in 118    // Uetliberg 5
    replace labp = 2 in 129    // Zürich / Fluntern 1    
    replace labp = 1 in 70    // Luzern 1
    replace labp = 1 in 37    // Einsiedlen 1    
    replace labp = 5 in 29    // Col du... 5    
    replace labp = 5 in 79    // Männlichen 5    
    replace labp = 1 in 27    // Chateau 12    
    replace labp = 1 in 81    // Mühleberg 1    
    replace labp = 6 in 82    // Mühleberg / Stockeren 1    
    replace labp = 4 in 85    // Neuchâtel
    replace labp = 11 in 58    // La Brevine 11    
    replace labp = 11 in 12    // Basel / Binnigen 11    
    replace labp = 11 in 61    // Langnau 11    
    replace labp = 1 in 25    // Chasseral 1    
    replace labp = 12 in 31    // Cressier 11    

    gen nothing=""

    // mostly red color scheme, RGB values
    loc col "`"158 202 225"' `"222 235 247"' `"254 229 217"'`"252 146 114"'`"251 106 74"'`"203 24 29"'`"153 0 13"'"
    zmapv12 ydpci7913 lat lon, break(0 2.5 5 7.5 10 13)                     /// =
            ms(O ..) msize(medsmall medium medlarge large vlarge huge)        ///
            mlabel(no no no no st st st)                                     ///
            mcolor(`col')                                                     ///
            mlabc(`col')                                                    ///
            mlabs(*.55 ..) mlabv(labp) mlabangle(45 ..)                        ///
            ti("The Perfect Climate Index (PCI) for Switzerland, 1979-2013", ///
            size(medsmall))    note(" ")                                         ///             
            legend(order(2 3 4 5 6 7) rows(1)) ysize(3.3) aspect()            ///
            addplot(tw area _Y _X, cmissing(n) nodropbase color(gs14))        ///
            name(CHmap_ydpci7913_bluered7_label, replace)

*! 1.0.0 by Alex Gamma, 04 May 2013
*! adapted from -zmap- by NJC
*! 1.1.1 NJC 10 December 2012
*! 1.1.0 NJC 3 December 2012
*! 1.0.0 NJC 12 March 2010 

capture program drop zmapv12
program zmapv12, sort
    version 12   
    syntax varlist(numeric min=3 max=3) [if] [in], ///
    [ PCTiles(numlist min=1 >0 <100) ///
    BReaks(numlist min=1 sort) ///
    MULTiples MSymbol(str) MColor(str asis) addplot(str asis) * ]

    // check options
    if "`pctiles'" != "" & "`breaks'" != "" {
        di as err "may not specify pctiles() and breaks()"
        exit 198
    } 

    loc dpipe ||
    if `"`addplot'"'=="" loc dpipe

    // check data
    marksample touse  
    qui count if `touse'
    if r(N) == 0 error 2000 

    quietly {
        tokenize "`varlist'"
        args z y x 

        local zlab : variable label `z'
        if `"`zlab'"' == "" local zlab "`z'"
        
        local ylab : value label `y'
        if "`ylab'"=="" local ylab "`y'"
        
        /// default
        if "`pctiles'`breaks'" == "" {
            local pctiles 5 10 25 50 75 90 95
        }

        local nbreaks : word count `pctiles' `breaks' 

        if `nbreaks' <= 8 & `"`mcolor'"' == "" {
            if `nbreaks' == 1        local palette gs12 gs4
            else if `nbreaks' == 2   local palette gs14 gs8 gs2
            else if `nbreaks' == 3   local palette gs15 gs10 gs5 gs0
            else if `nbreaks' == 4   local palette gs15 gs12 gs8 gs4 gs0
            else if `nbreaks' == 5   local palette gs15 gs13 gs10 gs7 gs4 gs1
            else if `nbreaks' == 6   local palette gs15 gs13 gs11 gs9 gs7 gs5 gs3
            else if `nbreaks' == 7   local palette gs15 gs13 gs11 gs9 gs7 gs5 gs3 gs1
            else if `nbreaks' == 8   local palette gs15 gs14 gs12 gs10 gs8 gs6 gs4 gs2 gs0
            local palette mcolor(`palette' `palette')
        }
        
        local note "breaks `breaks'" 

        if "`pctiles'" != "" {
            local OK 1 5 10 25 50 75 90 95 99
            local notOK : list pctiles - OK
            local OK : list pctiles & OK
        
            if "`OK'" != "" {
                su `z' if `touse', detail
                foreach p of local OK {
                    local breaks `breaks' `= r(p`p')'
                }
            } 

            if "`notOK'" != "" {
                _pctile `z' if `touse', percentiles(`notOK')
                local i = 1
                foreach p of local notOK {
                    local breaks `breaks' `= r(r`i')'  
                    local ++i
                }
            } 

            local note "percentile breaks `pctiles'%"
            numlist "`breaks'", sort
            local breaks `r(numlist)'
        } 

        local allint = 1
        foreach b of local breaks {
            if `b' != ceil(`b') local allint 0
        } 

        local bprev : word 1 of `breaks'
        local breaks : list breaks - bprev
        local breaks `breaks' . 

        if "`format'" == "" {
            local fmt : format `z'
        }
        else local fmt `format' 

        if "`multiples'" == "" {
            tempvar y1
            gen `y1':`ylab' = `y' if `z' < `bprev'
            local Z = cond(`allint', "`bprev'", trim("`: di %4.1g `bprev''"))
            label var `y1' "-`Z'"
            local Y `y1'  
            local i = 2

            if "`msymbol'" == "" {  
                tempvar copy1
                gen byte `copy1' = .
                label var `copy1' "- `Z'"
                local COPY `copy1'
                local P p
                local S S
            }
                
            foreach b of local breaks {
                tempvar y`i'  
                gen `y`i'':`ylab' = `y' if `z' >= `bprev' & `z' < `b'
                local Y `Y' `y`i''
                local Z1 = cond(`allint', "`bprev'", trim("`: di %4.1g `bprev''"))
                local Z2 = cond(`allint', "`b'", trim("`: di %4.1g `b''"))
                if "`b'" == "." local Z2
                label var `y`i'' "`Z1' - `Z2'" 

                if "`msymbol'" == "" {
                    tempvar copy`i'
                    gen byte `copy`i'' = .
                    label var `copy`i'' "`Z1' - `Z2'"
                    local COPY `COPY' `copy`i''
                    local P `P' p
                    local S `S' S
                } 

                local bprev `b'
                local ++i
            }
            
            if "`msymbol'" == "" {
                local ny : word count `Y'
                local ny1 = `ny' + 1
                local ny2 = 2 * `ny'
                numlist "`ny1'/`ny2'"    
                local order order(`r(numlist)')
                local msymbol `P' `S'
            }

            if `"`mcolor'"' != "" {
                local palette mcolor(`mcolor' `mcolor')
            }
            
            noisily `addplot' `dpipe'     ///                    
            scatter `Y' `COPY' `x' if `touse' ///
            , ysc(off) xsc(off) ti(`"`zlab'"') ms(`msymbol') `palette' ///
            plotregion(style(none)) note(`note') `options'
        }
        else {
            tempvar g
            gen byte `g' = 1 if `z' < `bprev'
            local i = 2
            local Z = cond(`allint', "`bprev'", trim("`: di %4.1g `b''"))
            local G 1 "-`bprev'"
            
            foreach b of local breaks {
                replace `g' = `i' if `z' >= `bprev' & `z' < `b'
                local Z1 = cond(`allint', "`bprev'", trim("`: di %4.1g `bprev''"))
                local Z2 = cond(`allint', "`b'", trim("`: di %4.1g `b''"))
                if "`b'" == "." local Z2
                local G `G' `i' "`Z1' - `Z2'"
                local bprev `b'
                local ++i
            }
            
            tempname gname
            label def `gname' `G'                                                                      
            label val `g' `gname' 

            if `"`mcolor'"' == "" local mcolor "gs4" 

            noisily scatter `y' `x' if `touse' ///
            , by(`g',  ti(`"`zlab'"') compact note(`note')) ///
            yla(none) xla(none) ms(p) mcolor(`mcolor') legend(off) plotregion(style(none)) `options' ///
            || `addplot'
        }
    }
end
Advertisements

2 thoughts on “The Perfect Climate Index, Take I

  1. For your first graph, an alternative would have been to use -twoway rbar- to draw vectors (with the horizontal option), which might give better control over vertical placement, and more control over linewidth. See the -sqindexplot- command of the SQ package (ssc install sq) for an example (without gaps between the lines). There’s an example at http://teaching.sociology.ul.ie/bhalpin/sqind1.png

    • Thanks for pointing this out, Brendan! I have been able to recreate a quick-and-dirty version of the graph in no time (download here):

      The code:

      // code to quickly reproduce the missing data graph using sqindexplot
      // tip of the hat to Brendan Halpin for suggesting this 
      
      egen nmisstmax11 = cut(nmisstmax), at(0(36)396)
      sqset nmisstmax11 stnn year
      
      sqindexplot, order(stnn) 											///
      	color(white red*.1 red*.2 red*.3 red*.4 red*.5 					///
      		  red*.6 red*.7 red*.8 red*.9 cranberry) 					///
      	ti("Missing values: nmisstmax11", size(medsmall))	  			///
      	legend(size(vsmall)) yscale(noreverse) 							///
      	name(sqindexplot_nmisstmax11_year, replace)				
      
      

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s