Crowdsourced air traffic data from The OpenSky Network 2020

    Source: https://zenodo.org/record/5092942#.YRBCyTpRXYd

    Martin Strohmeier, Xavier Olive, Jannis Lübbe, Matthias Schäfer, and Vincent Lenders
    “Crowdsourced air traffic data from the OpenSky Network 2019–2020”
    Earth System Science Data 13(2), 2021

    Run the command:

    Download will take about 2 minutes with good internet connection. There are 30 files with total size of 4.3 GB.

    1. (
    2. callsign String,
    3. number String,
    4. icao24 String,
    5. registration String,
    6. typecode String,
    7. origin String,
    8. destination String,
    9. firstseen DateTime,
    10. lastseen DateTime,
    11. day DateTime,
    12. latitude_1 Float64,
    13. longitude_1 Float64,
    14. altitude_1 Float64,
    15. latitude_2 Float64,
    16. longitude_2 Float64,
    17. altitude_2 Float64
    18. ) ENGINE = MergeTree ORDER BY (origin, destination, callsign);

    Upload data into ClickHouse in parallel:

    1. ls -1 flightlist_*.csv.gz | xargs -P100 -I{} bash -c 'gzip -c -d "{}" | clickhouse-client --date_time_input_format best_effort --query "INSERT INTO opensky FORMAT CSVWithNames"'
    • Here we pass the list of files (ls -1 flightlist_*.csv.gz) to xargs for parallel processing.
      xargs -P100 specifies to use up to 100 parallel workers but as we only have 30 files, the number of workers will be only 30.
    • For every file, xargs will run a script with bash -c. The script has substitution in form of {} and the xargs command will substitute the filename to it (we have asked it for xargs with -I{}).
    • The script will decompress the file (gzip -c -d "{}") to standard output (-c parameter) and the output is redirected to clickhouse-client.
    • We also asked to parse DateTime fields with extended parser () to recognize ISO-8601 format with timezone offsets.

    Finally, clickhouse-client will do insertion. It will read input data in CSVWithNames format.

    Parallel upload takes 24 seconds.

    1. for file in flightlist_*.csv.gz; do gzip -c -d "$file" | clickhouse-client --date_time_input_format best_effort --query "INSERT INTO opensky FORMAT CSVWithNames"; done

    Query:

    1. SELECT count() FROM opensky;

    Result:

    The size of dataset in ClickHouse is just 2.66 GiB, check it.

    Query:

    1. SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'opensky';

    Result:

    1. ┌─formatReadableSize(total_bytes)─┐
    2. 2.66 GiB
    3. └─────────────────────────────────┘

    Total distance travelled is 68 billion kilometers.

    Query:

    1. SELECT formatReadableQuantity(sum(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)) / 1000) FROM opensky;
    1. ┌─formatReadableQuantity(divide(sum(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2)), 1000))─┐
    2. 68.72 billion
    3. └──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

    Average flight distance is around 1000 km.

    Query:

    Result:

    1. ┌─avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2))─┐
    2. 1041090.6465708319
    3. └────────────────────────────────────────────────────────────────────┘

    Query:

    1. SELECT
    2. origin,
    3. count(),
    4. round(avg(geoDistance(longitude_1, latitude_1, longitude_2, latitude_2))) AS distance,
    5. bar(distance, 0, 10000000, 100) AS bar
    6. FROM opensky
    7. WHERE origin != ''
    8. GROUP BY origin
    9. ORDER BY count() DESC

    Result:

    1. ┌─origin─┬─count()─┬─distance─┬─bar────────────────────────────────────┐
    2. 1. KORD 745007 1546108 ███████████████▍
    3. 2. KDFW 696702 1358721 █████████████▌
    4. 3. KATL 667286 1169661 ███████████▋
    5. 4. KDEN 582709 1287742 ████████████▊
    6. 5. KLAX 581952 2628393 ██████████████████████████▎
    7. 6. KLAS 447789 1336967 █████████████▎
    8. 7. KPHX 428558 1345635 █████████████▍
    9. 8. KSEA 412592 1757317 █████████████████▌
    10. 9. KCLT 404612 880355 ████████▋
    11. 10. VIDP 363074 1445052 ██████████████▍
    12. 11. EDDF 362643 2263960 ██████████████████████▋
    13. 12. KSFO 361869 2445732 ████████████████████████▍
    14. 13. KJFK 349232 2996550 █████████████████████████████▊
    15. 14. KMSP 346010 1287328 ████████████▋
    16. 15. LFPG 344748 2206203 ██████████████████████
    17. 16. EGLL 341370 3216593 ████████████████████████████████▏
    18. 17. EHAM 340272 2116425 █████████████████████▏
    19. 18. KEWR 337696 1826545 ██████████████████▎
    20. 19. KPHL 320762 1291761 ████████████▊
    21. 20. OMDB 308855 2855706 ████████████████████████████▌
    22. 21. UUEE 307098 1555122 ███████████████▌
    23. 22. KBOS 304416 1621675 ████████████████▏
    24. 23. LEMD 291787 1695097 ████████████████▊
    25. 24. YSSY 272979 1875298 ██████████████████▋
    26. 25. KMIA 265121 1923542 ███████████████████▏
    27. 26. ZGSZ 263497 745086 ███████▍
    28. 27. EDDM 256691 1361453 █████████████▌
    29. 28. WMKK 254264 1626688 ████████████████▎
    30. 29. CYYZ 251192 2175026 █████████████████████▋
    31. 30. KLGA 248699 1106935 ███████████
    32. 31. VHHH 248473 3457658 ██████████████████████████████████▌
    33. 32. RJTT 243477 1272744 ████████████▋
    34. 33. KBWI 241440 1187060 ███████████▋
    35. 34. KIAD 239558 1683485 ████████████████▋
    36. 35. KIAH 234202 1538335 ███████████████▍
    37. 36. KFLL 223447 1464410 ██████████████▋
    38. 37. KDAL 212055 1082339 ██████████▋
    39. 38. KDCA 207883 1013359 ██████████▏
    40. 39. LIRF 207047 1427965 ██████████████▎
    41. 40. PANC 206007 2525359 █████████████████████████▎
    42. 41. LTFJ 205415 860470 ████████▌
    43. 42. KDTW 204020 1106716 ███████████
    44. 43. VABB 201679 1300865 █████████████
    45. 44. OTHH 200797 3759544 █████████████████████████████████████▌
    46. 45. KMDW 200796 1232551 ████████████▎
    47. 46. KSAN 198003 1495195 ██████████████▊
    48. 47. KPDX 197760 1269230 ████████████▋
    49. 48. SBGR 197624 2041697 ████████████████████▍
    50. 49. VOBL 189011 1040180 ██████████▍
    51. 50. LEBL 188956 1283190 ████████████▋
    52. 51. YBBN 188011 1253405 ████████████▌
    53. 52. LSZH 187934 1572029 ███████████████▋
    54. 53. YMML 187643 1870076 ██████████████████▋
    55. 55. KSNA 180045 778484 ███████▋
    56. 56. EGKK 176420 1694770 ████████████████▊
    57. 57. LOWW 176191 1274833 ████████████▋
    58. 58. UUDD 176099 1368226 █████████████▋
    59. 59. RKSI 173466 3079026 ██████████████████████████████▋
    60. 60. EKCH 172128 1229895 ████████████▎
    61. 61. KOAK 171119 1114447 ███████████▏
    62. 62. RPLL 170122 1440735 ██████████████▍
    63. 63. KRDU 167001 830521 ████████▎
    64. 64. KAUS 164524 1256198 ████████████▌
    65. 65. KBNA 163242 1022726 ██████████▏
    66. 66. KSDF 162655 1380867 █████████████▋
    67. 67. ENGM 160732 910108 █████████
    68. 68. LIMC 160696 1564620 ███████████████▋
    69. 69. KSJC 159278 1081125 ██████████▋
    70. 70. KSTL 157984 1026699 ██████████▎
    71. 71. UUWW 156811 1261155 ████████████▌
    72. 72. KIND 153929 987944 █████████▊
    73. 73. ESSA 153390 1203439 ████████████
    74. 74. KMCO 153351 1508657 ███████████████
    75. 75. KDVT 152895 74048
    76. 76. VTBS 152645 2255591 ██████████████████████▌
    77. 77. CYVR 149574 2027413 ████████████████████▎
    78. 78. EIDW 148723 1503985 ███████████████
    79. 79. LFPO 143277 1152964 ███████████▌
    80. 80. EGSS 140830 1348183 █████████████▍
    81. 81. KAPA 140776 420441 ████▏
    82. 82. KHOU 138985 1068806 ██████████▋
    83. 83. KTPA 138033 1338223 █████████████▍
    84. 84. KFFZ 137333 55397
    85. 85. NZAA 136092 1581264 ███████████████▋
    86. 86. YPPH 133916 1271550 ████████████▋
    87. 87. RJBB 133522 1805623 ██████████████████
    88. 88. EDDL 133018 1265919 ████████████▋
    89. 89. ULLI 130501 1197108 ███████████▊
    90. 90. KIWA 127195 250876 ██▌
    91. 91. KTEB 126969 1189414 ███████████▊
    92. 92. VOMM 125616 1127757 ███████████▎
    93. 93. LSGG 123998 1049101 ██████████▍
    94. 94. LPPT 122733 1779187 █████████████████▋
    95. 95. WSSS 120493 3264122 ████████████████████████████████▋
    96. 96. EBBR 118539 1579939 ███████████████▋
    97. 97. VTBD 118107 661627 ██████▌
    98. 98. KVNY 116326 692960 ██████▊
    99. 99. EDDT 115122 941740 █████████▍
    100. 100. EFHK 114860 1629143 ████████████████▎
    101. └────────┴─────────┴──────────┴────────────────────────────────────────┘

    Query:

    1. SELECT
    2. toMonday(day) AS k,
    3. count() AS c,
    4. bar(c, 0, 10000, 100) AS bar
    5. FROM opensky
    6. WHERE origin IN ('UUEE', 'UUDD', 'UUWW')
    7. GROUP BY k

    Result: