Bug #1778
Bug in geometry download by run number
100%
Description
There's a bug in the SQL logic used to retrieve geometry by run number.
Updated by Rajaram, Durga over 7 years ago
Email from Janusz, September 24
Hi, We had a closer look with Durga at the algorithm the CDB is using to retrieve the geometry by run. The key is to find a start run timestamp fist, then to fine a recored in the geometry table corresponding to that run. To explain how it is (Incorrectly) done let’s have a look at the tail of the geometry table: cdb=# select id, validfromtime, creationtime from geometry where id > 55; id | validfromtime | creationtime ----+---------------------+------------------------- 56 | 2035-08-18 10:02:00 | 2015-08-18 12:13:13.958 57 | 2035-08-18 10:02:00 | 2015-08-18 12:13:27.307 58 | 2015-07-13 10:00:00 | 2015-08-24 17:00:49.556 59 | 2015-07-13 10:00:00 | 2015-08-24 17:25:24.358 60 | 2015-07-13 10:00:00 | 2015-08-25 16:32:23.453 61 | 2015-06-20 10:00:00 | 2015-08-25 17:21:04.021 62 | 2015-07-13 10:00:00 | 2015-08-25 17:21:09.115 63 | 2015-06-20 10:00:00 | 2015-08-26 14:11:35.566 64 | 2015-07-13 10:00:00 | 2015-08-26 14:11:41.934 65 | 2015-06-20 10:00:00 | 2015-08-31 21:34:06.09 66 | 2015-07-13 10:00:00 | 2015-08-31 21:34:12.149 67 | 2015-06-20 10:00:00 | 2015-09-01 12:06:59.434 68 | 2015-07-13 10:00:00 | 2015-09-01 12:07:59.886 69 | 2015-06-20 12:00:00 | 2015-09-14 16:08:42.336 70 | 2015-07-13 12:01:00 | 2015-09-14 16:09:16.086 Now lets assume there is a run which started at '2015-07-20 11:19:45’. The query should find the latest valid geometry with the validity time <= the timestamp above, and if multiple records exist it should take one with the latest creation time. The query which is actually performed is done in 2 steps: 1) find the max(creationtime) select max(creationtime) from geometry where validfromtime <= '2015-07-20 11:19:45' and id <= 69; (note <=69) this way I’m ‘unfixing’ Ryan’s desperate attempt of getting the right answer from a broken system ;-) ) This returns: max ------------------------- 2015-09-14 16:08:42.336 and the only record with this timestamp is one from June ! Indeed, the next step is: 2) select id, validfromtime, creationtime from geometry where creationtime='2015-09-14 16:08:42.336' and validfromtime <= '2015-07-20 11:19:45' and id <= 69; which gives: id | validfromtime | creationtime ----+---------------------+------------------------- 69 | 2015-06-20 12:00:00 | 2015-09-14 16:08:42.336 But a careful reader should notice that the right record should be one from July, and if many, one from the latest creation time (id=68). FIX: Indeed: cdb=# select id, validfromtime, creationtime from geometry where validfromtime=(SELECT max(validfromtime) from geometry WHERE validfromtime <= '2015-07-20 11:19:45' and id <= 69); id | validfromtime | creationtime ----+---------------------+------------------------- 58 | 2015-07-13 10:00:00 | 2015-08-24 17:00:49.556 59 | 2015-07-13 10:00:00 | 2015-08-24 17:25:24.358 60 | 2015-07-13 10:00:00 | 2015-08-25 16:32:23.453 62 | 2015-07-13 10:00:00 | 2015-08-25 17:21:09.115 64 | 2015-07-13 10:00:00 | 2015-08-26 14:11:41.934 66 | 2015-07-13 10:00:00 | 2015-08-31 21:34:12.149 68 | 2015-07-13 10:00:00 | 2015-09-01 12:07:59.886 (7 rows) or more precisely: cdb=# select id, validfromtime, creationtime from geometry where validfromtime=(SELECT max(validfromtime) from geometry WHERE validfromtime <= '2015-07-20 11:19:45' and id <= 69) ORDER BY creationtime DESC LIMIT 1; id | validfromtime | creationtime ----+---------------------+------------------------- 68 | 2015-07-13 10:00:00 | 2015-09-01 12:07:59.886 (1 row) Topic opened for discussion ;-) . A real query selects vdml, not the stuff above, but doing so here would be impractical for demonstration purposes. There is also getCurrentGDML, but this one seems to use a similar max(validfromtime) approach, so it most likely does not suffer from the bug, but I have to check this more carefully. cheers JM (I’ll file a ticket after lunch) cheers JM
Updated by Bayes, Ryan over 7 years ago
I think I follow and agree with Janusz' logic. I am also planning on adding a line to the next geometry upload so that this can be explicitly verified --- the GDML files considered in this discussion only have very minor differences between them so it would be useful to have a line in the Maus Information file that explicitly states the geometry ID independent of whether the geometry is downloaded by run or as the current geometry.
Updated by Martyniak, Janusz over 7 years ago
Hello,
I applied a fix and installed it on preprod. I ran it for runnumber=7357.
The query as performed by the server is:
query: SELECT starttime FROM run WHERE runnumber=7357;
query: SELECT gdml FROM geometry WHERE validfromtime = (SELECT max(validfromtime) FROM geometry WHERE validfromtime <= ('2015-09-17 11:23:35.0')) ORDER BY creationtime DESC LIMIT 1;
as it should be according to the bug description.
Since it returns a gdml blob to Python I cannot really test if what is returned is what we are actually expecting to get.
Anyone, who failed before for certain run numbers (what run numbers ??) could re-run their tests and see if the result is correct ?
I did the following:
1) we know the run number, so,
g_by_run = server.get_gdml_for_run(7357)
exectuting the query manually on preprod gives id=685 (and the whole list when LIMIT 1 is omitted, so there are several geometries for the same validfromtime.)
2) so we know the id:
g_by_id = server.get_gdml_for_id(685)
type(g_by_run)
<type 'str’>
so fingers crossed:
g_by_run == g_by_id
True
Bingo !
Updated by Rajaram, Durga about 7 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
This is now on the production master. Closing issue.