On Decision and Confidence

We make many decisions every day, consciously as well as unconsciously. The term “decisions” here are not just about the high-level processes that govern how we think and asses events and observations, it is also the low-level ones that control perception and movement.  For example, have you ever wondered how our body can move so smoothly ? This is definitely not an easy task if you ask any roboticists who are struggling to implement the human gaits on robots. Some scientists believe that our brain makes reliable, quick-fire predictions about the result of every movement we make, which results in a efficient sequence of actions that we call “walking”.

Confidence holds a crucial role in this process. How confident we feel about our choices will influence our behavior. If we did not have an accurate mechanism for confidence that is usually right, we would have difficulties in correcting decisions.

Important as it is, the way it works remains an unsolved riddle. The classical approach assumes that the brain takes shortcuts when processing information: it make approximations rather than uses precise statistical calculations. However, in a very recent paper, Adam Kepecs, professor of neuroscience at Cold Spring Harbor Laboratory has concluded that the subjective feeling of confidence stems from objective statistical calculations in the brain.

 

To determine whether the brains use objective calculations to compute the level of confidence, Kepecs created a video game to compare human and computer performance. Human volunteers would listen to streams of clicking sounds and determine which clicks were faster. Participants rated confidence in each choice on a scale of one (a random guess) to five (high confidence). What Kepecs and his colleagues found was that human responses were similar to statistical calculations. The brain produces feelings of con- fidence that inform decisions the same way statistics pulls patterns out of noisy data.

Figure 2
The human feeling of confience follows statistical predictions in a perceptual decision task. Source: Aam Kepecs et al.

To further examine his model, Kepecs organised another experiment in which participants answered questions comparing the populations of various countries. Unlike the perceptual test, this one had the added complexity of each participant’s individual knowledge base. Even human foibles, such as being overconfident in the face of hard choices with poor data or under-confident when facing easy choices, were consistent with Kepecs’s model.

This is not the first time a scientist suggest that our brains relies more on a statistical model than a heuristic one.  In many perception tasks, it was showed that people tend to make estimates in a way that fits with Bayesian probability framework. There’s also evidence that the brain makes internal predictions and updates them in a Bayesian manner. When we read a book or listen to someone talking, for example, our brain is not simply receiving information, it is constantly analyzing this stream of data and predicting what it expects to read or hear. These predictions strongly influence what we actually read or hear. More general, we can argue that our perception of the world is in fact a reconstruction made by the brain: we don’t (or can’t ?) see the world as it is, but we see it the way our brain is expecting it.

To maintain a level of homogeneity between the real world and the “reconstructed” reality, the brain is constantly revises its predictions based on what information comes next. Making predictions and re-evaluation them seems to be a universal feature of the brain. At all times our brain is weighing its inputs and comparing them with internal predictions in order to make sense of the world.

So far we have seen some arguments support the  (Bayesian) statistical paradigm. However, the scientists from the “anti-Bayesian” camp have provided a number of strong counter-arguments, especially when it comes to high-level decision making. It is fairly easy to come up with probability puzzles that should yield to Bayesian methods, but that regularly leave many people flummoxed. For instance, many people will say that if we toss a series of coins, getting all heads or all tails is less likely than getting any “seemingly random” sequence, for example, tails–tails–heads–tails–heads. It is not: as the coin tosses are independent, there is no reason to expect one sequence is more likely than another. There’s considerable evidence like the coin tosses experiment above which shows that most people are basically non-Bayesian when performing high-level, logical reasoning.

All in all, we are dealing with the most complicated thing in the known universe, and all the discovery up to know about our brain just scratch the surface. A lot of work still need to be done in order to truly understand how we think.

In conclusion, I believe that Bayesian paradigm, with its quirks and imperfections, represent a potential approach that can eventually help us see the complete picture of our brain.

Human thought process from a data nerd’s point of view

This post is inspired by a late night discussion with a friend at a party (yes, because at 2am and a stomach full of mojito, there is no better topic to talk about than Machine Learning), so please take it with a grain of salt.

The ultimate goal of Artificial Intelligence is, as its name suggests, to create a system that can reason the way human does. Today, despite all the hype about Deep learning, people who work in Machine Learning and AI know that there is still a long way to go in order to achieve that dream. What we have done so far is extremely impressive, but every single modern ML algorithm is very data-intensive, which means that it needs a lot of examples to work well. Beside, in some sense, ML algorithm is just “remembering” what we show them, their abilities to extrapolate the knowledge are very limited, or nonexistent. For example, show a baby a dog, and later she can easily distinguish between a dog and a cat, even though she didn’t “know” the cat. Most of ML models can not do that, if you show them something that they have never seen, they will just try to find the most similar thing in their vocabulary to assign to it.

terminator_salvation77
“Sorry dude, I have no idea how to create Skynet.”

Anyway, today’s topic is not about the machines. In this post, I want to take the opposite approach and compare the human thought process with a machine learning model.

For me, the way we reason and make decision follows a generative model: we compute the probability distribution for each options, and then we choose the most probable one. We use extensively Bayes’ rule to incorporate new observation into our worldview, which means that in our mind, we already have a prior probability distribution for every phenomenon. Each time we have a new information about a particular phenomenon, we will then update the corresponding prior. For example, for someone who spends his whole life in a tropical country, when he sees that it is sunny, he is 100% sure that it is hot. Now if he moves to a country in the temperate zone, he will then have to “update” his belief because in winter, it is cold with or without the sun.

bayes_theorem_1
Bayes’ theorem. Source: gaussianwaves.com

The prior belief is what people usually call “prejudice”, and the question whether it is hard to change or not depends on each individual. I will argue that for a young, open-minded person, her “prejudice distribution” has the form of a Gaussian curve with high variance, which doesn’t have a lot of statistical strength, which allows her to update her belief easily. In contrast, someone with very low Gaussian variance has a firm belief (or prejudice), and it’s very difficult to change their mind.

normal_distribution_pdf-svg
For the same mean, a person with higher “variance” is more open-minded (the peak is lower, the tail has more weight). Source: Wikipedia.

Now let’s think about the decision making process. Just like a ML model, we use “data” to make a “prediction”. Each “data point” is a collection of many features, ie. information that can potentially affect the decision.

Before going any further, we need to talk about the “bias-variance dilemma” in Machine Learning. In his amazing article “A few useful things to know about machine learning”, Pedro Domingos gave the following explanation:

Bias is a learner’s tendency to consistently learn the same wrong thing.

Variance is the tendency to learn random things irrespective of the real signal.

note_aftml_bias_variance_in_dart_throwing
Source:  “A few useful things to know about machine learning”, Pedro Domingos.

The bias-variance says that if the bias increases, the variance will eventually decrease and vice versa. This tradeoff links directly to a severe problem in ML: overfitting and underfitting.

Overfitting occurs when the model is learning from the noise and it can’t generalise well (low bias – high variance).

Underfitting is the opposite: the model is not robust enough to make any decision (high bias – low variance).

When building a model, every ML practitioner faces this challenge: find the sweet spot between bias and variance.

In my opinion, the problem with human thought process is that in auto mode, our mind is constantly “underfitting” the “data”, the reason being our mental model is too simple to deal with the complexity of life (wow I sound like a philosopher haha!). I need to emphasize  the “auto mode” here because when we are conscious about the situation and focus at the task at hand, we become much more effective. However, over 90% of the decisions we make everyday is in unconscious state (just to be clear, I don’t mean that we are in a coma 90% of the time …).

The question now is: why is our mental model too simple ? From a ML point of view, I can think of 3 reasons:

  1. Lack of data: this can seems to contradict to what I just said in the beginning about our great ability to learn well with few observations. However, I still stand my ground: humans are amazing at learning and extrapolate concrete concepts. The problem raises when it comes to the abstract, complex ones that don’t have a clear definition. In these cases, the decision boundary is a non-linear, extremely complicated one, and thus without enough data, our mind fails to fit an appropriate model.
  2. Lack of features: this one is interesting. When building a ML system, we are usually encouraged to remove the number of features because it can help our model to generalize better and avoid overfitting. Moreover, a simpler model will need less computational power to run. I believe that our mind works in the same way: by limiting the number of features going into the mental model, it can process the information faster and more efficiently.The problem is that for complex situations, the model doesn’t have enough features to make good decisions. One obvious example is when we first meet someone. It is commonly know that we just have seven seconds to make a first impressions. Statistically speaking, this is because our mental model for first impression just takes into account the appearance as feature, it doesn’t care (at that very moment) the personality of the person, his job, or his education, …
  3. Wrong loss function: loss function is the core element of every ML algorithm. Concretely, to improve the performance of the prediction, a ML algorithm needs to have a metric that allows it to know how well it does so far. That’s when loss function comes into play: it decides what is the “gap” between the desired output and its actual prediction. The ML algorithm then just needs to optimise that loss function.If we think about our thought process, we can see that for certain tasks, we have the wrong idea about the “loss function” since the beginning. An extreme example is when we want to please or impress someone, we begin to bend our opinions to suit theirs, and eventually our worldview is largely affected by theirs. This is because our loss function in this case is “her satisfaction” instead of “my satisfaction”. This is why people usually say that the key to success is to “fake it till you make it”: if your loss function is your success, get out of your comfort zone and do whatever the most succesful ones are doing, your mental model will eventually be changed to maximize it.

So what can we do to improve our mental model, or more concretely, to make better decisions ? This is a very hard question and I’m not at all qualified to answer it. However, for the sake of argument, let’s think about it as a ML model: what would you do if your ML model doesn’t work well ? Here are my suggestions:

  1. Experience more: this is the obvious solution for the lack of data. By getting out of your comfort zone, stretching your mind, you will “update” your prior belief more quickly. So do whatever that challenge you, be it physically or mentally: read a book, ride a horse, run 20 km, implement a ML algorithm (my favorite ahah), just please don’t sit there and let the social media shapes your mental model.
  2. Be mindful: as I said earlier, when we are really conscious of our actions, we can perform in a whole new level with an incredible efficiency. By being mindful, we can use more “features” than what our mental model usually takes into account, and thus we can have a better view of the situation. However, this is easier said than done, I don’t think that  we can biologically maintain that state all the time.
  3. Reflect on yourself: each week/month/year, spend some time reflecting on your “loss function”: what is your priority ? what do you want to do ? who do you want to become ? Let it be the compass for your actions, your decisions and you will soon be amazed by the results.

In conclusion,  mental model is just like a ML model in production: you can not modify its output on the fly. If you want to improve its performance systematically, you need to take time to analyse and understand why it works the way it does. This is a trial-and-error, an iteration process that can be long and tedious, but it is crucial for every model.

Experience more, embrace mindfulness and reflect often, sooner or later you will possess a robust mental model. All the best!

 

Random thought on randomness or why people suck at long-term vision

Law of large number is one of the foundational theorem in probability theory. It says that the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.

lawoflargenumbers
Example of Law of Large Number: tossing a coin

This theorem is very simple and intuitive. And perhaps because it is too intuitive, it becomes counter-intuitive. Why ? Let’s talk about the Gambler’s Fallacy: in a binary-result event, if there has been a long run of one outcome, an observer might reason that because the 2 outcomes are destined to come out in a given ratio in a lengthy set of trials, the outcome that has not appeared for a while is temporarily advantaged. Upon seing six straight occurences of black from spins of a roulette wheel, a gambler suffering from this illusion would confidently bet on red for the next spin.

Why is it fallacious to think that sequences will self-correct for temporary departures from the expected ratio of the respective out-comes ? Ignoring for a moment the statistically correct answer that each turn is independent from each other and imagining that the gambler’s illusion is real, we can still point out many problems with that logic. For example, how long will this effect last ? If we take the roulette ball and hide it for 10 years, when unearthed, how will it know to have a preference for red ? Obviously, the gambler’s fallacy can’t be right.

So why can’t the Law of Large Number be applied in the case of Gambler’s Fallacy ?

Short answer: Statistically speaking, humans are shortsighted creatures.

Long answer: People generally fail to appreciate that occasional long runs of one or the other outcome are a natural feature of random sequences. If you don’t buy it, let’s do a small game: take out a small piece of paper and write down a sequence of random binary number (1 or 0 for example). Once you are done, count the length of the longest run of either value. You will notice that that number is quite small. It has been demonstrated that we tend to avoid long runs. The sequences we write usually alternate back and forth too quickly between the two outcomes. This appears to be because people expect random outcomes to be representative of the process that generates them, so that if the trial-by-trial expectations for the two outcomes are 50/50, then we will try to make the series come out almost evenly divided. People generally assume too much local regularity in their concepts of chance, or in other terms, people are lousy random number generators.

So there you are, we can see that human, in nature, are statistically detail-oriented. We don’t usually consider the big picture but regconize only some “remarkable” details which will shape our point of view about the world. When we meet a new person, the observations of a few isolated behaviors leads directly to judgments of stable personal characteristics such as friendliness or introversion. Here it is likely that observations of the behavior of another person are not perceived as potentially variable samples over time, but as direct indicants of stable traits. This problem is usually described as “The Law of small number”, which refers to the tendency to impute too much stability to small-sample results.

Obviously, knowing about this won’t change our nature, but at least once we acknowledge about our bias, we can be more mindful of the situation and of our decisions.