As an applied scientist, understanding and use of statistics properly is essential. Unfortunately, statistics is the most misunderstood and least developed skill in the populace. This is amazing to me because all children under the age of five are natural statisticians, more specifically they learn using stochastic processes.
Stay with me. A stochastic isn't bad because sounds scary word, it's your friend, embrace it. But what is it? Let's use a simple analogy.
A child drops a spoon while eating. The parent picks it up and gives it back to the child. A younger child repeats this process more times than an older child. The child is performing an experiment using a stochastic process. In other words, the child is using statistics to determine how likely a spoon will hit the ground when dropped. (Basically testing gravity). Gravity is pretty unforgiving, so the child learns dropping a spoon will ALWAYS hit the ground (i.e., the outcome is 100% certain). 100% certain is another phrase for 100% probability an attempt will have the same outcome. This is the basis of statistics and a stochastic process. After this is learned, performing the experiment may be amusing, but does not provide anymore knowledge, to the child moves on to another experiment. Usually, something that will test the boundaries of a parent's patience.
Almost all college students need to pass the dreaded statistics course, even liberal arts majors (who try to avoid science in favor of art and humanities — something that does not require math). The best statistics teacher I knew in college was my political science professor. After passing the course, almost all students continue to avoid the subject. This is because it is highly technical and difficult to do properly. The key here is properly.
The core reason statistics are synonymous with lies is because doing statistics properly is hard. Studies that are wrong are published all the time. This re-iterates the belief that statistics is bullshit. Bad statistics are worse than bullshit, they are usually dangerous. Testing a vaccine is ALL about statistics. Almost all the statisticians I know work crunch numbers for drug research. Releasing a vaccine without doing proper research (i.e., the probability of the vaccine working or killing someone is as close to 0% as possible) is pretty important.
It's sad to say that some really smart people making important decisions are bad at statistics. They first step in any study is determining if the data is valid. This is accomplished using the "run test". It basically determines if the data comes from a repeatable process where each point is collected under the same conditions. This is where the problem occurs. Conditions change. Unless the conditions are locked down, it's not possible to determine the probability of a cause having an effect. For anything more other than dropping a spoon, controlling conditions is hard. It's even harder when trying to determine cause and effect when the process used to collect the data is unknown.
Public officials are not required to use statistics to make decisions. This has a direct effect on public welfare. The wide variation in how directives in response to COVID-19 outbreaks is making this clear. In reality, determining the cause of an outbreak is the only way to determine if changing behavior will have an effect on the outcome. The US strategy to urge self quarantine and take away ALL opportunities for infection is basically killing a fly with a cannonball. This is because experiments to determine cause are not or cannot be performed in a manner that allows statistics to be valid. Without using this scientific method, we are left to trust the wisdom of our leaders and governors.
Wisdom is a funny thing. It only works when the conditions that led to a belief are valid for the context in which the wisdom is used. For instance, restricting the occupancy of a restaurant effects the mortality rate of COVID-19. This may be true in some instances, but to what degree depends on conditions. Some governors of localities (i.e., mayors) have measured the infection rates traced back to restaurant visits. At least one calculated at 1.5% infection rate, basically it is 98.5% UNLIKELY that an infection occurs from a restaurant visit. Should restrictions be placed on restaurants across the board if this is true? If conditions across the board are the same, NO. If conditions across the board are NOT the same, YES.
It is pretty widely known that a single person who does not wear a mask properly or at all, infects a large area. It is best to restrict the movements of this "polluter" to contain the pathogen. In a sit down restaurant with good practices, people don't move around much. In a fast food restaurant with self service, people move around a lot. They are more likely to be in a hurry and less likely to be "polluters". So, maybe restricting self service restaurants is warranted but sit down restaurants is not.
How do we know? Performing experiments would tell us. Let's say we have two restaurants in the same type of community with staff trained in the same manner. Say the restaurants have the same owner. Let's also say the restaurant has surveillance that records activity. Data can be gathered to determine the likelihood of "polluting" activities. This data can be analyzed statistically to determine cause and effect. One restaurant could impose a stricter policy, say 25% occupancy vs. another at 50% occupancy. If the data indicates the policy is being followed, the opportunity for pollution can be calculated to a reasonable degree of certainty. Guidelines and directives can therefore be more detailed and more effective, without completely shutting down entire sub-economies, like the restaurant industry.
OK, so maybe this is hard, but it is necessary. It will remain necessary until inoculations are ubiquitous enough to make the probability of infection too low for concern. Say 1 in a million infections per month across the US. This is possible. It took decades to eliminate polio and measles through inoculations (after development of an effective vaccine). But it took a couple of years to eradicate malaria in the area around the Panama Canal during construction. (A massive force killing mosquitoes was employed and breeding grounds were found and destroyed relentlessly.)
The notion that you can make statistics say whenever you want is only true for BAD statistics. Obtaining GOOD statistics is hard and distinguishing between GOOD and BAD statistics is also hard. So, yeah, you can make BAD statistics say whatever you want. However, to make a GOOD decision requires a GOOD statistical study with high probably from a trusted (i.e., tested/competent) source essential. If the study results in a probability of a specific cause having a specific effect that matches the conditions being controlled by a decision, the result is predicable. It is rarely possible to do this, therefore GOOD statistics are very hard to come by.
That being said, giving up on trying to get GOOD statistics as the basis of decisions MUST not be abandoned. If anything, more education on statistics is needed. We are tested periodically to determine if we have the skill to drive. Having a test for making decisions based on good statistics seems like a good idea, but unlikely to occur. A good leader will take the time to appreciate this skill and surround themselves with people who provide good statistics and detect when the conditions under which they are employed are valid. This requires the leader to listen before AND AFTER a policy is made and deployed. A good statistician will be vocal about the validity of the policy given what effect it actually has and bring new recommendations to the leader's attention. A good leader will change the policy and recognize that conditions in the real world may or may not be the same as those used to calculate the statistics. When conditions are not the same, better statistics must be calculated and policies adjusted. If done properly the matter will be resolved as quickly as possible.