Archive for February, 2017

STATISTICS AT THE PINNACLE-PART 2

February 15, 2017

I’d look at the audience and find two rows where I saw 22 people.  “What do you think the probability in these 22 there is at least one pair with the same birthday?”   I’d ask that because most people would think it is quite unlikely.  Our brains tell us that, but our brains can deceive us, not only the brains of others.  Turns out, the probability is about 50%.  I might even start with 7 rows containing 70 people, where the probability is 99.9%.  “This does not make intuitive sense,” I would add, “but is easily proven by taking the approach of the probability that two people don’t have the same birthday.”  My statistics advisor did this in a class where I happened to be attending as a graduate student, and the first student’s birthday matched mine.  He said the look on the face of the students was priceless. “Why does this matter?  It matters because sometimes the way you think is wrong, flat out wrong!  Your brain lied to you.  Your brain said the likelihood of two people in the same room’s having the same birthday was small, very unlikely with 22 people, which is not true.  Our brains lie to us about speed, direction, and up and down. They worry more about improbable losses than probable gains, and how certain major events in our lives shape our thinking, even if they are very unlikely to ever happen agan.   The solution to the birthday problem is also a good life lesson:  figure out what you don’t want and whatever is left over is what you want.”

I’d talk about the lottery and expected values. People play the lottery, because eventually somebody wins.  We can predict quite accurately the probability that somebody will win.  “You see,” I’d say, “low probability events happen; they just happen with low probability.  Take the lottery with a 1 in 110 million chance of winning.  If 330 million tickets have been bought, the expected value of jackpot winners is 3.  That doesn’t mean that 3 will win, but it is expected.  We can easily, and I mean easily calculate the probability of 0,1,2,3, and 4 with a calculator and a few key strokes.  Three people nationwide might win.  Three people, in the entire country.  Yes, it has to be somebody.  But do you think it is going to be you?”

If you have a disease, you have certain symptoms.  Medicine is the study of people who have certain symptoms and tries to figure out the probability of their having a disease.  Physicians and others would do well to understand the idea that not all who test positive for a disease have the disease.  “Suppose a disease has a 0.1% prevalence in the population, or 1 in 1000 people has it.  We would do well to teach percentages early in math and often, too.  Suppose if you have the disease, you test positive for it 98% of the time.  If you don’t have the disease, you test negative for it 99% of the time.  You test positive.  What is the likelihood you have the disease?

What is important here is the background frequency of the disease.  The fact the disease occurs in only 1 of 1000 means that it is unlikely somebody who tests positive will have the disease: only 9%.  

Anybody remember W. Edwards Deming?  He was ignored here but found the Japanese receptive to his ideas about data analysis and optimizing systems.  The Japanese cleaned our clocks in the automotive industry before the Big Three caught on, not because Japanese cars were fancy, but because they worked.  There is an apocryphal story about how a Japanese company was told by an American buyer that no more than 4% of ball bearings should be faulty.  In the next shipment, 4 at the top of every box were faulty.  When asked why they were there, the company spokesman said, ‘you didn’t want more than 4% faulty.  Here they are, on top.  The rest are perfect.’

“Deming taught that variability could be classified as “common cause” (noise) and “special cause” (signal, important).  It was he who said that considering every variation as significant was not only wasteful, such “tinkering” made the process worse.  How often do we hear comparisons of say a murder number in a city being more than last year’s and hearing somebody pontificate an explanation?  Have you ever heard that this is common cause variability, and that if you want to lower the murder number, you need to address the entire system?

Samples have to be random, which is a way of saying everybody in the population, the group of people one is studying, has a definable non-zero chance of being chosen.  That doesn’t mean, ‘They didn’t ask me, so the sample is no good.’  It’s no good if the sample is done in the Deep South and the sampler wants to extrapolate it to the whole country.  One living in New York or Ohio never had a chance of being sampled.  Most people think large samples mean more useful results, but bias in a sample of 200 continues to be bias in a sample of 200,000 if the methodology doesn’t change.  The mathematics of sampling are not difficult to understand, and if one wishes to be a little less confident, 90% rather than 95%, and the margin of error for a dichotomous (yes-no) question allowed to rise to 8 or 9%, rather than 1-2%, the sample size needed decreases dramatically.

“I’ve worn out my welcome, but let me finish by mentioning the concept of 2-3 standard deviations from the mean, which most people take as being a significant outlier.  That all depends whether the curve is Bell-shaped.  If it is, then the probability of something more than 2 standard deviations from the mean is 5%.  But it is possible, depending upon the distribution of the data, to have up to 25% of items more than 2 standard deviations from the mean, hardly a significant outlier.  For 3 standard deviations, it is 3 in 1000 chance with a bell-shaped curve, but with some distributions, up to 11% of the observations.  I wonder how many who have been 2-3 standard deviations on the wrong side of the curve have been punished unjustly.  Five standard deviations?  4%.  It is 1/25, which is 5 squared.  This is known as Chebyshev’s Inequality.

“Finally, I would like to see students learn how to make good graphs instead of the ones I see today.  I would make Edward Tufte’s books required reading.  I would like to see more line graphs, dot plots, and box-and-whisker graphs with fewer multi-color pie charts.  I said I could go on for three more pages about statistics.  I have. Statistics has many day-to-day encounters, it is often used poorly, both by those who don’t know it and worse, by those who want to fool you.  It’s not lies, damned lies, and statistics but rather lies and damned people who lie using statistics.” 

LIFELONG LEARNERS

February 12, 2017

The man had a lot of miles on him.  Smelling of tobacco and woodsmoke, my age or a little older, I had been helping him understand the rules of exponents.  His partner joined us briefly, and I wondered what both of their goals were.  I should have asked.  Then a young man in his late 20s or early 30s walked in to the lab, and since I was the only tutor in an uncrowded room, asked if I could help him solve a math problem he said his teacher couldn’t.  I had an “Uh Oh,” sort of moment, thinking I would face something awful, but the problem itself was fairly straightforward:

percent of opening= (air mixture-x)/(outside air temperature-x).  Solve for x.

It didn’t seem too difficult, and I solved it.  Then he said, “Oh, I forgot, this is an absolute value problem.”  Oh.  That made it a little more difficult, but I worked it out and came up with two solutions, which absolute value problems have, both of which checked, and looked at the rest of the problem, commenting, “so this might be how a thermostat works, right?”

“Yes,” he replied.  “This is a cooling unit, and this equation solves for how much the damper should be open.”

“Wow,” I said.  “I’ve just learned something.”

“This type of cooling industry has only been around for two years,” the young man said.  I didn’t ask him details, but without too much effort, later I found myself looking at Heat and Mass Exchange (HMX) technologies and found something that looked very much like the equation the student gave me.  Efficiency of cooling systems has increased significantly, to 60%; what I was reading sounded like science fiction.

I’m not surprised.  When I substituted in math in Tucson high schools, I told the students that they would be working at jobs that not only didn’t exist today, they couldn’t even be imagined today.  This student would have a job which didn’t exist when I moved here.

Thomas Friedman writes about these changes in Thank you for Being Late.  Friedman is a successful columnist leagues beyond my limited success, but I can relate to how chance meetings or chance thoughts can help create a column or produce a major change in one’s thinking.  Friedman writes how technology has moved well beyond the ability of society to adapt to it; technology is exponentially increasing, but our ability to adapt is linear with a small positive slope.  This is difficult for many, especially the twenty year-olds who entered the labor force and didn’t realize they would have to be lifelong learners.  The current president was elected in large part from many who think that somehow all we need to do is bring back the high paying jobs that were once available for people with limited education.  At best, those jobs are now modestly paying, higher paying being reserved for those who have learned enough to navigate Friedman’s “Supernova,” a term he likes better than the “Cloud”.

My student was becoming a lifelong learner.  Like the elderly man learning exponents, he is at the community college obtaining math skills that he never learned when he was younger.  The job he has is not likely to be his only one.  The days of being in the same field for 40 years are not gone; Thomas Friedman is still a journalist.  The days of doing the same work for 40 years are mostly over, except in simple work, the kind that is likely to be automated.  Journalists no longer queue up at a telephone to send a story.  It’s streamed.  In energy production, rather than having miners underground, the entire earth over a seam can be sadly removed.  But even coal’s days are numbered, at least as a primary source of energy.  I think most fossil sources of energy days’ are numbered, not because of climate change, but because the technological advances in cleaner energy are so rapid that they are competing favorably, even despite a non-level playing field.  Solar energy efficiency has doubled in the last 30 years to 20-45%;  I remember when it “jumped” to 6%.

The ability to connect and to do things is greater than ever before, but one must have a decent education, meaning STEM subjects and ability to write decently,  communicate, good interpersonal skills, and…., willingness to keep learning throughout one’s lifetime.  Put bluntly: you never finish school.  This isn’t going down well in places that were once manufacturing hotbeds, like Middletown, Ohio, in A Hillbilly’s Elegy.

Not only will people need to become lifelong learners, they must be collaborators, requiring social skills, too. We need well-educated socialized  graduates with proven competency, a tall order.  Here’s my world:  recently, a friend asked me to look at a paper she and three others wrote. She is Colombian, now in school in Germany, has learned 2 languages in the past 5 years, and studies VaR (Value at Risk) near Berlin.  The paper was written in decent English, 5000 words and well referenced.  I had never heard of VaR before, although I should have, for it is a statistical financial measure.

Education is different.  Research has exploded, open source software common, and people all over the world are collaborating.  I have my name on a meteorological paper written about pollution in Tabriz, Iran, because I helped an Iranian learn English.  She’s now living in Spain. A journalist friend of mine in New Delhi has changed jobs twice since I’ve known her, and she works hours that even I in my medical training didn’t work.  I’m not well connected, but through teaching English on various web sites I communicate at least weekly with people on five continents. I’ve communicated in German as well as English, and I’ve been offered teaching English jobs in both China and Brazil.  I bet I could get one in Iran too, if I dared go. A couple snowshoeing with me yesterday teach English in China, because the energy market crashed here.  She’s from New Zealand originally; both know a smattering of Mandarin.

A Kurdish woman I know in Iraq couldn’t find work as an engineer, so she re-invented herself as a travel agent and doing well.  A Syrian asked me to help her sister with her English writing.  How she survived the past six years I have no idea.  The next paper I get from her sister will be an essay about the war.  A friend is German, on her way to Moscow to prepare for the launch of a satellite she helped design to one of the Lagrangian Points (equidistant from the Earth and Sun) to look at X-Ray radiation.  Still another is Russian, learning two languages to be able to become a translator in Europe.  Another emigrated from Iran to Australia, has a permanent stay card in Australia and hopes to become a medical professional.  I helped with some geometry problems a while back.  I get all sorts of perspectives about America, good and bad.

If I were young, I’d be learning at least two other languages, probably German and Russian, and maybe studying abroad.  In this era, having connections world-wide is important and  not too difficult to obtain, given the connectivity today.

Education must be flexible with new courses quickly developed to understand new knowledge.  How we determine competency must also change, a piece of paper less important than proven skills.  Home and online study will be important, but isn’t the answer.  One needs a guide, a mentor, and a teacher all rolled into one.  How America will address education will be painful and very different from not only what it is today, but likely what we can even imagine. We will be required to deal not only with the Supernova-Cloud, collaborate internationally, but simultaneously educate people with limited means, financial and neuronal, so they have some floor under them to keep them grounded, rather than to looking for a wall to hang on, to quote Mr. Friedman.  Stay tuned.

Thank you for coming into the Math Lab, young man.  Had I not met you, I never would have seen so clearly what Mr. Friedman was writing about.  I don’t have the answers for society; I don’t even have them for me, but Mr. Friedman did write that knowing what questions to ask would be essential in the new world, and we statisticians make our living not by having all the answers, but trying to ask the right questions.

 

SAY MORE, ARTHUR BENJAMIN!–STATISTICS AT THE PINNACLE–PART I

February 7, 2017

One of my good hiking friends posted a TED talk by Arthur Benjamin on why we should teach statistics at the pinnacle of math education, rather than calculus.  I had only two complaints with his talk: first, it was too short, fewer than 3 minutes.  He should have gone on for an hour with that audience.  They would have learned a lot from him. Second, I’d add that statistics can teach us a lot about life lessons.

I commented briefly, saying that I could easily write for 3 pages.  Then I thought, “Why not?” Few will read it, because it’s math, and well….

Anyway, I’d start off with a deprecatory statement about my field:  “We statisticians are almost never right.  That’s remarkable. Never right.  BUT, we know how wrong we are likely to be, because our estimates have a margin of error.  Any estimate that does not have a margin of error is, to us, worthless.  If that fact went to the Halls of Congress, if somebody said that “Social Security will be bankrupt by 2028,” I’d like someone to ask, “What is the margin of error?”  Why?  Because somebody made a prediction about the future with data.  If somebody made a different prediction with slightly different assumptions, they would have gotten a different answer.  How different?  That is what margins of error are all about.  We’re talking about the future, and we can’t predict the future with utter confidence.

“What is confidence?” I would ask. “Let me first define probability: Probability is the likelihood an event will occur in the future, not the past.  It can be 0, no possibility at all; 1, certain it will occur, or any number in between those two.  Those who gamble know something about probability.  Good bridge players know probabilities of various distributions of cards; six missing cards in a suit are more likely to divide 4-2 than 3-3.  What we need to learn in this society is the idea that probability is not always equal if there are only two options.  Heads-tails is 50-50; boy-girl is close enough, although not exactly 50%. Millions of illegals voted in the last election or did not, or vaccines cause autism vs. they don’t, and you still have two possibilities, but now they aren’t equal.  I wish the media would learn that and not assume all sides deserve equal billing.  As a corollary, I wish the media would remember that strong statements require strong evidence.

“Roll a die, and there is 1/6 chance a 3 will come up; all 6 possibilities have equal probability.  But when you roll two dice, there are 11 possible sums, from 2-12 inclusive, and their probabilities are not all 1/11.  If you disagree, please see me with your wallet in hand and we will play, because the expected value of my winnings, which is the likelihood of my profit or loss over a period of time, will be in my favor.  If I can bet on the fewest sums that will in the long run pay me money, I will choose 7, which has a 1/6 probability, 6 and 8, which each have (5/36) probability, and either 5 or 9, each of which has 1/9 probability.  In the long run, the probability will be 20/36 in my favor.  We need to teach that competing ideas do not necessarily have the same probability.  That means we shouldn’t give equal time to people who think alien abduction occurs, because it either does or doesn’t, and they feel they should have equal say.  When we get to more significant probabilistic questions, such as smoking significantly increases the likelihood of lung cancer or heart disease, or that polio vaccination dramatically decreases the likelihood of contracting polio, we can and should make appropriate public policy.  Liberal theories?  Nope, just laws of mathematics that can be proven and which may be applied to everyday life.

“Furthermore, probability can be independent or dependent, and failure to remember that was in part was behind the Challenger shuttle disaster. Independence means that the results of one trial don’t affect the next.  Dice don’t have a memory.  Dependence means that they do.  When one O-ring fails, the likelihood of another’s failing increased.  Pull three aces out of a deck of cards, and the probability I will draw an ace from the remaining cards is now 1/49.  That is a conditional probability.

“When we make an estimate of something, we need a margin of error, a wonderful concept which teaches us to be humble and say, “I could be wrong,” four words every man ought to learn before getting married, and a breath of fresh air again in the Hallowed Halls of Power.  A caution, however, in that a margin of error doesn’t mean anything goes, that “anything is possible.”  Anything is possible if one’s idea of possibility is a one in a trillion event matters.  Statistics discusses things like million, billion, and trillion, so let me describe likelihoods for various scenarios:

  • 1 in 1000: about the likelihood of getting a straight flush in poker or correctly picking a second at random that I have chosen which occurred in the last 17 minutes.
  • 1 in 10000: about the likelihood of guessing right a kilometer I am thinking of between Chicago and Tokyo, or picking a minute correctly that I am thinking of that occurred in the past week.  
  • 1 in 100,000: correctly picking a millimeter at random that I am thinking about on a football field from the back of end zone to the back of the opposite end zone.  Correctly pick an hour chosen at random in the past 12 years.
  • 1 in a million: Correctly pick a person chosen at random in a large city; a second chosen at random in the last 12 days; an acre I am thinking of in a large wilderness area 50 x 30 miles size.
  • 1 in two billion:  Correctly pick a second, chosen at random, from the 1 January 1955 to now.  A single second. Correctly pick a randomly chosen acre in the US.
  • 1 in a trillion: Pick a day at random since the Earth was formed.  

I think that every legislator be compelled to know the differences among million, billion and trillion before they are allowed to run for office, so we don’t get silly statements of “billions and billions, and billions of acres are locked up by the federal government.”  The whole country has fewer than 2 billion acres.  If you don’t have the sense of what a billionaire is, you can’t appreciate how much money that is.  A billionaire could spend two thousand dollars a minute for a full year, day and night, before they would run out of money.  Ten million dollar house bought Monday morning?  Paid off Thursday evening.

“We use something called a confidence interval.  That is a range around an estimate where we state how confident we are that the true value lies in the interval.  It isn’t probability, it’s confidence.  You see, there exists a true value, but it is unknown and unknowable.  The range we have will either contain that true value or it won’t.  That is a 100%-0% question and not helpful.  We have 95% confidence intervals to explain that if we were to take 100 different samples, obtain 100 different estimates and confidence intervals, 95% of them would contain the true value, but we wouldn’t know which 95.  See?  We don’t know the answer.  But we are highly confident we can construct an interval wherein it lies.

Knowing confidence intervals would have been useful for journalists who reported on the once famous 44,000-98,000 deaths annually due to medical errors.  They rounded the latter figure up to 100,000 and used it, but the point estimate of 71,000 was the single best number.  Zero was not possible, nor 10,000, nor a million, not possible if we are going to remain sensible about the world.

“Global climate change likelihood is prediction, which lends itself to statistics and to confidence intervals, and the IPCC was more than 95% confident years ago, a strong statement of science.  It means that the interval they calculated was highly likely not to contain 0, no temperature rise.  It is incumbent upon those who disagree to come up with a confidence interval so that we can look at their data and see what assumptions and calculations their models have.  This would prevent a lot of unnecessary arguing, and the arguments we have would be more appropriate.

“Means and medians are basic concepts people should understand, because a mean, the average, is affected greatly by outliers, whereas the median is not nearly as sensitive.  Housing prices and salaries are much better described by the median.  

People talk about a non-existent term called the Law of Averages.  I’d not teach it, and maybe it would go away. There is The Law of Large Numbers, which says frequencies of events with the same likelihood of occurrence even out, given enough trials or instances.”

“I can see that a lot of you are yawning and looking fried.  I’m giving you a year’s curriculum in a few minutes.  Imagine, however, how useful all this stuff might be if I had a year to teach it to students.  I actually tried to do that in Tucson in 2011, for free, as a trial course, my swan song before leaving town 3 years later.  But I didn’t have an education degree, and the school had other priorities.  Such a shame, really.   OK, let’s take a break, and come back and I’ll finish the summary.”