Bold predictions for 2012.
The election will be more about policy than economics, and it will be between Romney and Obama, and Obama will win. I think the economy will improve, therefore Romney’s perceived economic management skills won’t be as meaningful to working class independents and democrats. He would need their help to take the White House.
I think the economy will improve.
The United States will have above-expected to solid growth. There are waves rippling out around the US hitting continents in two or so year intervals. The good thing for the US is that it has survived the two roughest waves already. In late 2007-08 consumer spending sucked in the US, it started to turn worse in 2009 for the consumer in Europe, and now in late 2011-12 China looks to be weakening. The governments were hit after the consumer. In 2009 the Federal Funds rate dropped to it’s lowest, in 2011 Europe almost collapsed – I don’t think it will in this year either. German leadership is too strong, and France is on board, and the ECB has to follow. China is about to get hit on the consumer side. They still may have a couple of years until government is in the spotlight. That’ll certainly be interesting to watch, and could create conflict in Asia, particularly if the public protests. There are investment ideas that flow from this overall view.
- US Equities are underpriced. Risk will be in higher demand driving prices higher.
- European Bonds could provide a great vehicle. Yields are beginning to top out, the currency is a wildcard to me. I think it could strengthen, but not much mor than I think it could weaken
- Some European Equities should be on the table. As they move through the process of saving their asses they risk may move back into favor.
- I wouldn’t play in China anyways, but it is important to gauge the currency. It’s essentially a fight between the natural position of the Yen and the politicians for upward movement vs the attempted softlanding.
So recently Bureau of Labor Statistics released the Oct. 2011 unemployment data. This is not a discussion of it’s validity nor it’s impact, but it is a post on how to visualize it. This post is also for my posterity, I’ve wanted to be able to do this for a while, and it’ll serve as a reference i.e. the map is my own, but the methods are pieced together from other sources.
So you can go over to the BLS Local Area Stats Page and get the data if you’d like to follow along.
First the data need to form-up so we can use it in R, where we’ll create the map. I (there may be better ways) copied the chart from the link into macvim. Then through a couple
s///g‘s I was able to get the file into csv format, which means we’re ready to open R.
There are two libraries we’ll be using to help us with this visualization,
ggplot2  and
So of course we’ll load them into our session:
Now that we have the library uploaded, we need to get the unemployment data in the session.
unemp <- read.csv("data.csv", header = F)
names(unemp) <- c("region", "percent")
unemp$region <- tolower(unemp$region)
1 north dakota 3.5
2 nebraska 4.2
3 south dakota 4.5
4 new hampshire 5.3
5 vermont 5.6
6 wyoming 5.7
So what we’re going to do next is create a single data.frame from two merged ones.
ggplot2 uses long and lat to map the data to the states, so we’ll need to associate the unemployment numbers with those long and lat number.
state_df <- map_data("state")
merged <- merge(state_df, unemp, by="region")
merged <- merged[order(merged$order),]
Great, so now the only step left is to create the map.
ggplot(merged, aes(long, lat, group = group)) +
+ geom_polygon(aes(fill = unemp), colour = alpha("white", 1/2), size = 0.2) +
+ geom_polygon(data = state_df, colour = "white", fill = NA)
And the finished product should look something like this:
 Hint: the space between the state name and the number is a tab, \t.
 I’ve been using ggplot2 for a couple weeks now, and it is awesome – highly recommended.
If there are two (can be generalized to n) classes and both follow the same distribution (but with different parameters) it is possible to predict which class an observations comes from.
Here I’ll try to predict a sample’s gender based on their height. The distribution of a person’s height is more or less normal. There are two parameters of a normal distribution. I’ll consider the easy case in this post: males and females have different average heights, but the distributions have the same standard deviation.
For the graphs and subscripts, male = 1, female = 0.
There are 3 things to do:
- Make sure our data is roughly normal. If our prediction is predicated on the data being normal, it data better be normal.
- Derive the decision rule.
- Test how well our rule works.
The data set we’ll be using is from the Journal of Statistics Education. I’ve stripped out most the the information except for height, and gender.
Again this looks good.
Deriving the Decision Rule
Great, so the the data is normal, but what’s next. We’ll make the decision to classify a case to a gender if the probability of that case being male is greater than that case being female. Or, formally,
Because we’ve assumed normality let’s put the pdf’s the inequality.
Remember, we assumed that the standard deviations were the same.
It’s fairly obvious form the equations that when,
the original inequality will hold. Now if we do some algebra we can see that when
the case will be classified as male. To visualize this, it would be a vertical line through the average of the means. Anything on the right male, on the left female.
To see how well our decision rule works the data needs to be split into a training set – to put actual numbers to the rule – then a testing set to see how well the prediction works.
I’ll be using R to do the analysis. All the data is in a data.frame
First we’ll split the data.frame into the training and testing sets.
> nr <- nrow(hw)
> hw.shuffle <- hw[sample.int(nr),]
> hw.train <- hw.shuffle[1:as.integer(nr*.7),]
> hw.test <- hw.shuffle[as.integer(nr*.7):nr,]
So now that the data is split into the two separate sets the mean of the training set can be tested against the test set.
> tapply(hw.train$height, hw.train$gender, mean)
Which means from decision rule derived above anything larger than the average of the 164.79 and 177.77, which is 171.28, will be classified as male, and under will be classified as female.
Now to set up the classification.
> hw.train.mean <- mean(c(164.79,177.77))
> hw.test$classify <- rep(0, (nrows(hw.test))
> hw.test$classify <- ifelse(hw.test$height > hw.train.mean, 1, 0)
> hw.test$classify <- as.factor(hw.test$classify)
> hw.test$classify <- as.factor(hw.test$classify)
> tab <- table(hw.test$gender, hw.test$classify)
0 62 12
1 17 61
The table shows the number predicted vs the actual number. Meaning there are 74 females in our test, and we correctly predicted 62 of them.
This is pretty good, of 152 test cases, the decision rule correctly predicted 123 correct or ~81%. It could be made potentially better by assuming a different standard deviations between factors.
And to wrap it up, and nice graph showing the rule overlaid with polynomial density.
- There are much better ways to check for normality, but this’ll do there.
- Remember that when you multiply both sides of an inequality by a negative number you switch the inequality.
So this isn’t supposed to be only about soccer (football) by any stretch, but it’s in the center of my universe now.
Today’s Results (6/18) of Group C
US eked out a 2-2 tie against Slovenia.
England had a disappointing 0-0 tie against Algeria.
Where We Stand
US has 2 points and a +0 goal differential.
Slovenia has 4 points and a +1 goal differential.
England has 2 points and a +0 goal differential.
Algeria has 1 point and a -1 goal differential.
The final two matches of group C pit the US against Algeria and England against Slovenia.
1. US wins
If the US wins, they’re in. Simple as that.
2. US ties
With a tie the US would need a tie between England and Slovenia. England also could not outscore the US by more than 2 goals, since they’ll have the same overall points. The tie breaker in question would be goals (with 3 ties the goal differential for both teams would obviously be zero) and the US currently has a 2 goal lead over the English.
3. US loses
If the US loses, they’re out. Simple as that.
As a first real post I’d like to write about US Men’s National Team.
General: The US teams has a balance of experience and young players. Donavan is striking as always, while Clint Dempsey and Bob Bradley’s son Michael Bradley head up the mid-field. Dempsey showed his worth in last summer’s Gold Cup, scoring an important goal in a stepping-stone win against Spain. Oguchi Onyewu returns from an injury sustained 7 months ago. It’ll be interesting to see if his how and if his fitness and form return. The sure-handed Tim Howard returns as Goalie.
Group play comes first with 3 matches.
Game 1: England V. US, 6/12, ABC, 2:30 PM EST
Ranked 8th in FIFA standings, England is a powerhouse with good players at almost every position. However their venerable captain Rio Ferdinand was recently injured. This makes the back line weak and inexperienced and could come as an emotional blow less than a week from the first match.
Prediction: Even with the loss of Ferdinand the English have a great chance. I think the US is playing well enough (coming off two friendly victories) to get a tie.
Note: I’ll be burning my English jersey in effigy shortly before this game.
Game 2: Slovenia V. US, 6/18, ESPN, 10 AM EST
An okay team with young players, Slovenia is ranked 25th in FIFA standings. They haven’t had much success as a team yet placing second to last in Euro ’08. A relative unknown, there isn’t much to say about Slovenia.
Prediction: US needs a victory here if they lose their first game to England. This should be too hard though given the US’s mixture of experience and youth, which should have an obvious advantage over Slovenia’s inexperienced youngsters.
Game 3: US V. Algeria, 6/23, ESPN, 10 AM EST
Having home continent advantage won’t be enough for Algeria, potentially the weakest team in the World Cup, to give the US much of a scare. They upset Egypt, a much stronger team, to make it.
Prediction: Again, US needs a victory here, which shouldn’t be too hard. The Algerians probably shouldn’t be here, and thus probably won’t be after group play is done.
The US should make it out of group play as the 2nd team in Group C behind England and any victories beyond group play should be seen as icing on the cake.
And since you’re wondering, my favorite teams for this WC are the US, France, The Orange, then Spain and Italy.