Context/Note on the confidence interval for data from sensors
While reading lux data from several smartphones, we noticed that different models of the samsung galaxy family (s1,s2,sIII,s4) showed different values for the same conditions. After some brainstorming about what to do in order to have some reliable data, a new question arose. Is every smartphone belonging to the same model showing the same data?
With two samsung galaxy s1 plus, we made readings all across Technosite headquarters. The location is exactly the same, and the time window is small enough to not influence in the different readings for the same location.
The readings were captured taking photographs with a third smartphone. Some of them were blurred, in that event the pair of readings from the same location have ended discarded, independent from what the exact reading was blurred.
Every tuple represents the readings in the same location for two different smartphones (same model) and the absolute value of their substraction
As we can see in the figure, there is not a 1:1 mapping between values. Whereas the difference between values is greater in the higher end, we could simplify it by saying that they are a 5% different (Actually the values here shown express y = f(x) = (0,983*x)+13,838 )
Without making further analysis, one can see that it's more reliable not to trust completely the data source. A good approach could be to include a confidence interval in the readings (hence in the minirules exploiting those) but this could as well make more complicated the reasoning about the data (it's a good step towards fuzzy logic). To keep things simpler during the first stages of Cloud4All/GPII , we recommend instead not to make many clusters of any category of readings. As an example, for lux readings, we may prefer to make just 5 different clusters. If, after some testing with users, that were too coarse grained luminance, we could try 7 values, but not many more in order to avoid having many frontiers and thus many potential sources of conflict. For the values next to the frontiers between clusters, a second reading and the notion of inertia could as well help.