20Q para dummies : Musings on 20Q

To clarify my own understanding of the course I am summarizing concepts that sounded interesting to me. My aim is to look for general concepts, those who -I guess- it will remain in my memory after the course is over. I’m just a student, may I be wrong or misunderstood some issues, so any correction or comment is welcome, and sorry for my bad English in advance, I’m not native English speaker. In sum, I intend to make “20 questions for dummies” trying to preserve the core ideas but exposed in a simplified way for casual readers.

Well, too much talk, let’s start now. From the first module on how the smartphones work, I learned that they are called "cell" phones because they use directional antennas radiating forward, with low power, at an angle of 270 degrees, located to cover small areas called "cells" for its resemblance to the cells of a honeycomb.

The operation of the smartphones is much more sophisticated than common people imagine. How can they send and receive such a huge amount of information if the spectrum available is limited? How to make the signals do not interfere and everything turns slow?. The answer is the CDMA system, which uses digital codification, with a much more efficient use of the available spectrum.

But that's not all, there are two other sophisticated mechanisms based in re-using the same frequency for different users and the dynamic control of power for transmit and receive.

The interference problem can be understood with the "cocktail party" example. Lets suppose you are in a noisy party trying to talk to someone, but you not just hear what the person tells you, but also the voices of all the other screaming that keep us communicate clearly. What would be the solution? Obviously if all agree and use a lower voice volume things will improve a lot for all.

That is precisely what is done with phones: to transmit and receive with minimum power, but how low? If power is too low, will be not enough to reach the antenna or, if it is too high will cause interference. The power required depends on the distance, and as phones are mobile the optimal power required changes dynamically for each device based on its location.

The cell phone system is a good example of how different actors compete dynamically by use of a scarce resource, as in the the Tragedy of the Commons . The scarse resource is the allocated bandwidth. Those who once made ham radio recall the problem, when a guy with a big amplifier hid all others.

In a public telephone system, such as cell phones, overcome to all other rising power is not an option. Instead, a socially optimal solution is required. A Pareto optimal is a border zone of values. From this border up, any improvement can only be obtained from another’s loss. In other terms in Pareto optimal there are no inefficiencies in the use of resources.

The situation of many contenders competing in a non-cooperative way for a scarce resource (the frequency spectrum allocated in this case) is a classic problem of the economy that applies to much kind of networks. Individual behaviours directed by self-interest, competing under fair and efficient rules, aided by feedback information. That is how the mobile phones deals with power control.

One way to allocate limited resources is optimization with linear programming: the goal here is to minimize the power of transmission, the constraint is a certain level of interference, the same for all, and the constants are the available channels and noise. This gives the area of feasible solutions whose border is Pareto optimal.

The other method to allocate resources is game theory, modelling the power demand as a competition. There are two possible models of games: the cooperative (as the coordination game) and competitive (as the prisoner's dilemma), the power control is a competitive model.

In practice CDMA use an iterative algorithm, adjusted with the negative feedback of information where the tranceivers tend to converge to the optimal state. The equation used is:

Pi (t +1) = (Gi / SIRi (t)) Pi (t)

Where Pi (t +1) is the adjusted power value, Gi is the expected level, SIRi (t) is the signal / noise ratio at time t (feedback signal) and Pi (t) the level of the Current power. This is a distributed algorithm because every pair of transceiver adjusts their power but they also consider the total effect of the system in the feedback of the signal to noise ratio.

Some may think "OK, this is all very interesting, but why may I care on how smartphones work?". It is not about the phones, but the concept of network and the way to fix the problem of conflictive use of scarce resources. That is a situation we often see in networks: competition, tragedy of the commons, the convergence achieved by optimization with incoming information fed back. If you think about it that's how many other networks work, it is a much more general principle than it looks.

Amazon Recomendations

Recommendations from Amazon's online store are based on a voting system from customers who voluntary assess products. Evaluators are themselves evaluated by a voting system in which the customer community participates. The Amazon system ranks not only by majority vote but also records the effect called "wisdom of crowds" and other such biases.

In 1906, on a farm in Plymouth, England, Sir Francis Galton did a very simple experiment that became important to the science of statistic. Galton's experiment was to gather 787 ordinary people and ask them to write their estimate of the weight of an ox. Everyone viewed the ox then wrote their estimate of the weight on a paper and threw the paper in an urn.

Among those who had guessed the weight were a few farmers and butchers. But most were ordinary people who had no idea how to estimate the weight of an ox. Against intuition, the estimated average weight was 1187 pounds, while the actual weight of the ox was 1198 pounds, or 99.92% of accuracy. And this was not due to a coincidence, because the experiment has been repeated countless times since then with similar results. This gave rise to a mathematical principle called "wisdom of crowds". The original paper, Vox Populi, of Francis Galton in Nature can read here.

This has a mathematical explanation. We can say that, if certain conditions are met, the estimates of an ignorant crowd can be more successful than those of experts, because the errors of underestimation and overestimation tend to offset the average thus converging to a correct value. The conditions that must be met for this to occur are three:

1.-There has to be a correct, objective answer
2.-Estimates should be completely independent and unbiased (must not have mutual influence or preconceived ideas)
3.-The number of independent estimates must be large enough (a major problem is that it means exactly "big enough")

If ignorant crowds are wise because errors are compensated for then why is there bad performance in democracies? I think it's obvious that the condition (2) is not met in any of the two parties, that explains why the wisdom of crowds does not work in politics.

This link shows an Amazon page with the camera of my dreams, and the product side shows the star of recommendation and a number.

This particular camera has won 4 of 5 stars and has been reviewed by 41 customers. This is very simple, the 4 stars are the mean score, rounded, of the 41 scores. The problem is if I want to compare it to, say, the Nikon D3100, which has a similar price. The Nikon also has 4 of 5 stars but 425 people evaluated the product.

Intuitively, we think that a score of 4 out of 425 reviews is better than the score of 4 out of 41 reviews and much better than a score of 4 to, say, 2 reviews. But what about a camera that has a score of 3 but with 10,000 reviews? Ranking is a problem analogous to the sort of Google search results, there is not one but several criteria that are managed according to some sophisticated mathematical analysis.

On this page, which does a ranking of all cameras, see that are not ordered simply for their scores but also by an algorithm that considers the score and the number of assessments and other factors.

Estimates can be modeled with the following formula:

yi (x) = x + Ei (x)

Where yi (x) is the estimate, x the actual value and Ei (x) the error of each estimate. There is the average of the errors of those who make the estimates, using the mean square error, defined by

Eae = 1 / N * (SUMi. .. n (Ex (Ei ^ 2 (x)))

ie the sum of squared errors divided by the N estimates. But there is also the average error, which after some algebraic manipulation results in Eea = 1 / N * (EAE) that is divided by the above error estimates and is usually much smaller than the average error of the individual estimates.

The factor (1 / N) which relates the mean errors with the average error is the mathematical expression of the "wisdom of crowds", and N is the number of people making the estimate, you can see that if N is very large the average error decreases. The higher N is the lower is the average error, assuming that the estimates are independent and unbiased, so that individual errors are well distributed.

This concept is used for a controversial branch of statistics called "Bayesian estimation", which assumes that past history gives information about the future, we have independent knowledge of the phenomenon. The work of Thomas Bayes was continued by Pierre-Simon Laplace who posed the following problem, "If, during the thousands of years on record, we observe that every day the sun is out, what is the probability that the sun will also be out tomorrow?"

In classical statistics we exist in a state of ignorance about future events. Even when an event is repeated many times we cannot deduce anything about the next event. Both estimates are philosophically different conceptual foundations.

According to classical statistics if I have a sequence of n experiments and sometimes I get a certain result, say "1" what is the probability that you get a “1” the next time? According to the classical statistical probability it would be exactly s / n, but the in view of Bayes it also depends on how many times that value was obtained before. After some integral calculus leads to a simple solution that incorporates this " history" and Bayesian estimation would be (s +1) / (n +2).

We introduce a factor that makes the probability of something happening to move in an area between the value of something that has never happened before to something that has happened many times. This is how the wisdom of the crowds is used to calculate the Amazon rankings.

Of course, the exact algorithm is a secret of Amazon, but through reverse engineering the algorithm has been determined based on the average scores, adjusted by a Bayesian estimation, plus a bias dependent on the quality of the reviewers. With some reverse engineering rankings noting that Amazon probably incorporates the Bayesian factor in a formula like this:

BayesianoAmazon = (R + Nprom * or * ri) / (Nprom + ri)

For a Bayesian estimation value N is the total number of observations but can also be a Nmin, Nmax or Npromedio one. Amazon uses a Nprom or Nmax. R is the average of all the scores, and is the average estimates of item i and ri the average score of item i.

This, like everything else we've seen in the course, not only has to do with the specific problem of Amazon, but with the general problem of constructing rankings and incorporating the "wisdom of crowds" to estimates, both are fundamental problems in the network economy

20Q para dummies

Páginas

Musings on 20Q

Amazon Recomendations

No hay comentarios:

Publicar un comentario

Sitemeter