Context/Note on the confidence interval for data from sensors

From wiki.gpii
Jump to: navigation, search

Introduction

While reading lux data from several smartphones, we noticed that different models of the samsung galaxy family (s1,s2,sIII,s4) showed different values for the same conditions. After some brainstorming about what to do in order to have some reliable data, a new question arose. Is every smartphone belonging to the same model showing the same data?

Methodology

With two samsung galaxy s1 plus, we made readings all across Technosite headquarters. The location is exactly the same, and the time window is small enough to not influence in the different readings for the same location.

' deviceA deviceB
model number GT-I9001 GT-I9001
firmware version 2.3.4 2.3.6
baseband version I9001BVKPB I9001BUKP4

The readings were captured taking photographs with a third smartphone. Some of them were blurred, in that event the pair of readings from the same location have ended discarded, independent from what the exact reading was blurred.

Results

Every tuple represents the readings in the same location for two different smartphones (same model) and the absolute value of their substraction

deviceB deviceA |A-B|
114 112 2
121 110 11
980 942 38
1004 1020 16
1016 1014 2
1034 1038 4
1088 1281 193
1180 1250 70
1259 1174 85
1262 1289 27
1285 1089 196


linear regression for the table. In the upper values the difference with the line is greater

As we can see in the figure, there is not a 1:1 mapping between values. Whereas the difference between values is greater in the higher end, we could simplify it by saying that they are a 5% different (Actually the values here shown express y = f(x) = (0,983*x)+13,838 )

Discussion

Without making further analysis, one can see that it's more reliable not to trust completely the data source. A good approach could be to include a confidence interval in the readings (hence in the minirules exploiting those) but this could as well make more complicated the reasoning about the data (it's a good step towards fuzzy logic). To keep things simpler during the first stages of Cloud4All/GPII , we recommend instead not to make many clusters of any category of readings. As an example, for lux readings, we may prefer to make just 5 different clusters. If, after some testing with users, that were too coarse grained luminance, we could try 7 values, but not many more in order to avoid having many frontiers and thus many potential sources of conflict. For the values next to the frontiers between clusters, a second reading and the notion of inertia could as well help.