cloudflare/ClickHouse
Publicmirrored fromhttps://github.com/cloudflare/ClickHouse
benchmark/monetdb/instruction.md
356lines · modecode
unknown
| 1 | Go to https://www.monetdb.org/ |
| 2 | |
| 3 | Dowload now. |
| 4 | Latest binary releases. |
| 5 | Ubuntu & Debian. |
| 6 | |
| 7 | https://www.monetdb.org/downloads/deb/ |
| 8 | |
| 9 | Go to the server where you want to install MonetDB. |
| 10 | ``` |
| 11 | $ sudo mcedit /etc/apt/sources.list.d/monetdb.list |
| 12 | ``` |
| 13 | Write: |
| 14 | ``` |
| 15 | deb https://dev.monetdb.org/downloads/deb/ bionic monetdb |
| 16 | ``` |
| 17 | |
| 18 | ``` |
| 19 | $ wget --output-document=- https://www.monetdb.org/downloads/MonetDB-GPG-KEY | sudo apt-key add - |
| 20 | |
| 21 | $ sudo apt update |
| 22 | $ sudo apt install monetdb5-sql monetdb-client |
| 23 | |
| 24 | $ sudo systemctl enable monetdbd |
| 25 | $ sudo systemctl start monetdbd |
| 26 | $ sudo usermod -a -G monetdb $USER |
| 27 | ``` |
| 28 | |
| 29 | Logout and login back to your server. |
| 30 | |
| 31 | Tutorial: |
| 32 | https://www.monetdb.org/Documentation/UserGuide/Tutorial |
| 33 | |
| 34 | Creating the database: |
| 35 | |
| 36 | ``` |
| 37 | $ sudo mkdir /opt/monetdb |
| 38 | $ sudo chmod 777 /opt/monetdb |
| 39 | $ monetdbd create /opt/monetdb |
| 40 | |
| 41 | $ monetdbd start /opt/monetdb |
| 42 | cannot remove socket files |
| 43 | ``` |
| 44 | |
| 45 | Now you have to stop MonetDB, copy the contents of `/var/monetdb5` to `/opt/monetdb` and replace the `/var/monetdb5` with symlink to `/opt/monetdb`. This is necessary, because I don't have free space in `/var` and creation of database in `/opt` did not succeed. |
| 46 | |
| 47 | Start MonetDB again. |
| 48 | |
| 49 | ``` |
| 50 | $ sudo systemctl start monetdbd |
| 51 | ``` |
| 52 | |
| 53 | ``` |
| 54 | $ monetdb create test |
| 55 | created database in maintenance mode: test |
| 56 | |
| 57 | $ monetdb release test |
| 58 | taken database out of maintenance mode: test |
| 59 | ``` |
| 60 | |
| 61 | Run client: |
| 62 | ``` |
| 63 | $ mclient -u monetdb -d test |
| 64 | ``` |
| 65 | |
| 66 | Type password: monetdb |
| 67 | |
| 68 | ``` |
| 69 | CREATE TABLE hits |
| 70 | ( |
| 71 | "WatchID" BIGINT, |
| 72 | "JavaEnable" TINYINT, |
| 73 | "Title" TEXT, |
| 74 | "GoodEvent" SMALLINT, |
| 75 | "EventTime" TIMESTAMP, |
| 76 | "EventDate" Date, |
| 77 | "CounterID" INTEGER, |
| 78 | "ClientIP" INTEGER, |
| 79 | "RegionID" INTEGER, |
| 80 | "UserID" BIGINT, |
| 81 | "CounterClass" TINYINT, |
| 82 | "OS" TINYINT, |
| 83 | "UserAgent" TINYINT, |
| 84 | "URL" TEXT, |
| 85 | "Referer" TEXT, |
| 86 | "Refresh" TINYINT, |
| 87 | "RefererCategoryID" SMALLINT, |
| 88 | "RefererRegionID" INTEGER, |
| 89 | "URLCategoryID" SMALLINT, |
| 90 | "URLRegionID" INTEGER, |
| 91 | "ResolutionWidth" SMALLINT, |
| 92 | "ResolutionHeight" SMALLINT, |
| 93 | "ResolutionDepth" TINYINT, |
| 94 | "FlashMajor" TINYINT, |
| 95 | "FlashMinor" TINYINT, |
| 96 | "FlashMinor2" TEXT, |
| 97 | "NetMajor" TINYINT, |
| 98 | "NetMinor" TINYINT, |
| 99 | "UserAgentMajor" SMALLINT, |
| 100 | "UserAgentMinor" TEXT, |
| 101 | "CookieEnable" TINYINT, |
| 102 | "JavascriptEnable" TINYINT, |
| 103 | "IsMobile" TINYINT, |
| 104 | "MobilePhone" TINYINT, |
| 105 | "MobilePhoneModel" TEXT, |
| 106 | "Params" TEXT, |
| 107 | "IPNetworkID" INTEGER, |
| 108 | "TraficSourceID" TINYINT, |
| 109 | "SearchEngineID" SMALLINT, |
| 110 | "SearchPhrase" TEXT, |
| 111 | "AdvEngineID" TINYINT, |
| 112 | "IsArtifical" TINYINT, |
| 113 | "WindowClientWidth" SMALLINT, |
| 114 | "WindowClientHeight" SMALLINT, |
| 115 | "ClientTimeZone" SMALLINT, |
| 116 | "ClientEventTime" TIMESTAMP, |
| 117 | "SilverlightVersion1" TINYINT, |
| 118 | "SilverlightVersion2" TINYINT, |
| 119 | "SilverlightVersion3" INTEGER, |
| 120 | "SilverlightVersion4" SMALLINT, |
| 121 | "PageCharset" TEXT, |
| 122 | "CodeVersion" INTEGER, |
| 123 | "IsLink" TINYINT, |
| 124 | "IsDownload" TINYINT, |
| 125 | "IsNotBounce" TINYINT, |
| 126 | "FUniqID" BIGINT, |
| 127 | "OriginalURL" TEXT, |
| 128 | "HID" INTEGER, |
| 129 | "IsOldCounter" TINYINT, |
| 130 | "IsEvent" TINYINT, |
| 131 | "IsParameter" TINYINT, |
| 132 | "DontCountHits" TINYINT, |
| 133 | "WithHash" TINYINT, |
| 134 | "HitColor" TEXT, |
| 135 | "LocalEventTime" TIMESTAMP, |
| 136 | "Age" TINYINT, |
| 137 | "Sex" TINYINT, |
| 138 | "Income" TINYINT, |
| 139 | "Interests" SMALLINT, |
| 140 | "Robotness" TINYINT, |
| 141 | "RemoteIP" INTEGER, |
| 142 | "WindowName" INTEGER, |
| 143 | "OpenerName" INTEGER, |
| 144 | "HistoryLength" SMALLINT, |
| 145 | "BrowserLanguage" TEXT, |
| 146 | "BrowserCountry" TEXT, |
| 147 | "SocialNetwork" TEXT, |
| 148 | "SocialAction" TEXT, |
| 149 | "HTTPError" SMALLINT, |
| 150 | "SendTiming" INTEGER, |
| 151 | "DNSTiming" INTEGER, |
| 152 | "ConnectTiming" INTEGER, |
| 153 | "ResponseStartTiming" INTEGER, |
| 154 | "ResponseEndTiming" INTEGER, |
| 155 | "FetchTiming" INTEGER, |
| 156 | "SocialSourceNetworkID" TINYINT, |
| 157 | "SocialSourcePage" TEXT, |
| 158 | "ParamPrice" BIGINT, |
| 159 | "ParamOrderID" TEXT, |
| 160 | "ParamCurrency" TEXT, |
| 161 | "ParamCurrencyID" SMALLINT, |
| 162 | "OpenstatServiceName" TEXT, |
| 163 | "OpenstatCampaignID" TEXT, |
| 164 | "OpenstatAdID" TEXT, |
| 165 | "OpenstatSourceID" TEXT, |
| 166 | "UTMSource" TEXT, |
| 167 | "UTMMedium" TEXT, |
| 168 | "UTMCampaign" TEXT, |
| 169 | "UTMContent" TEXT, |
| 170 | "UTMTerm" TEXT, |
| 171 | "FromTag" TEXT, |
| 172 | "HasGCLID" TINYINT, |
| 173 | "RefererHash" BIGINT, |
| 174 | "URLHash" BIGINT, |
| 175 | "CLID" INTEGER |
| 176 | ); |
| 177 | ``` |
| 178 | |
| 179 | # How to prepare data |
| 180 | |
| 181 | Download the 100 million rows dataset from here and insert into ClickHouse: |
| 182 | https://clickhouse.tech/docs/en/getting-started/example-datasets/metrica/ |
| 183 | |
| 184 | Create the dataset from ClickHouse: |
| 185 | |
| 186 | ``` |
| 187 | SELECT |
| 188 | toInt64(WatchID) = -9223372036854775808 ? -9223372036854775807 : toInt64(WatchID), |
| 189 | toInt8(JavaEnable) = -128 ? -127 : toInt8(JavaEnable), |
| 190 | toValidUTF8(toString(Title)), |
| 191 | toInt16(GoodEvent) = -32768 ? -32767 : toInt16(GoodEvent), |
| 192 | EventTime, |
| 193 | EventDate, |
| 194 | toInt32(CounterID) = -2147483648 ? -2147483647 : toInt32(CounterID), |
| 195 | toInt32(ClientIP) = -2147483648 ? -2147483647 : toInt32(ClientIP), |
| 196 | toInt32(RegionID) = -2147483648 ? -2147483647 : toInt32(RegionID), |
| 197 | toInt64(UserID) = -9223372036854775808 ? -9223372036854775807 : toInt64(UserID), |
| 198 | toInt8(CounterClass) = -128 ? -127 : toInt8(CounterClass), |
| 199 | toInt8(OS) = -128 ? -127 : toInt8(OS), |
| 200 | toInt8(UserAgent) = -128 ? -127 : toInt8(UserAgent), |
| 201 | toValidUTF8(toString(URL)), |
| 202 | toValidUTF8(toString(Referer)), |
| 203 | toInt8(Refresh) = -128 ? -127 : toInt8(Refresh), |
| 204 | toInt16(RefererCategoryID) = -32768 ? -32767 : toInt16(RefererCategoryID), |
| 205 | toInt32(RefererRegionID) = -2147483648 ? -2147483647 : toInt32(RefererRegionID), |
| 206 | toInt16(URLCategoryID) = -32768 ? -32767 : toInt16(URLCategoryID), |
| 207 | toInt32(URLRegionID) = -2147483648 ? -2147483647 : toInt32(URLRegionID), |
| 208 | toInt16(ResolutionWidth) = -32768 ? -32767 : toInt16(ResolutionWidth), |
| 209 | toInt16(ResolutionHeight) = -32768 ? -32767 : toInt16(ResolutionHeight), |
| 210 | toInt8(ResolutionDepth) = -128 ? -127 : toInt8(ResolutionDepth), |
| 211 | toInt8(FlashMajor) = -128 ? -127 : toInt8(FlashMajor), |
| 212 | toInt8(FlashMinor) = -128 ? -127 : toInt8(FlashMinor), |
| 213 | toValidUTF8(toString(FlashMinor2)), |
| 214 | toInt8(NetMajor) = -128 ? -127 : toInt8(NetMajor), |
| 215 | toInt8(NetMinor) = -128 ? -127 : toInt8(NetMinor), |
| 216 | toInt16(UserAgentMajor) = -32768 ? -32767 : toInt16(UserAgentMajor), |
| 217 | toValidUTF8(toString(UserAgentMinor)), |
| 218 | toInt8(CookieEnable) = -128 ? -127 : toInt8(CookieEnable), |
| 219 | toInt8(JavascriptEnable) = -128 ? -127 : toInt8(JavascriptEnable), |
| 220 | toInt8(IsMobile) = -128 ? -127 : toInt8(IsMobile), |
| 221 | toInt8(MobilePhone) = -128 ? -127 : toInt8(MobilePhone), |
| 222 | toValidUTF8(toString(MobilePhoneModel)), |
| 223 | toValidUTF8(toString(Params)), |
| 224 | toInt32(IPNetworkID) = -2147483648 ? -2147483647 : toInt32(IPNetworkID), |
| 225 | toInt8(TraficSourceID) = -128 ? -127 : toInt8(TraficSourceID), |
| 226 | toInt16(SearchEngineID) = -32768 ? -32767 : toInt16(SearchEngineID), |
| 227 | toValidUTF8(toString(SearchPhrase)), |
| 228 | toInt8(AdvEngineID) = -128 ? -127 : toInt8(AdvEngineID), |
| 229 | toInt8(IsArtifical) = -128 ? -127 : toInt8(IsArtifical), |
| 230 | toInt16(WindowClientWidth) = -32768 ? -32767 : toInt16(WindowClientWidth), |
| 231 | toInt16(WindowClientHeight) = -32768 ? -32767 : toInt16(WindowClientHeight), |
| 232 | toInt16(ClientTimeZone) = -32768 ? -32767 : toInt16(ClientTimeZone), |
| 233 | ClientEventTime, |
| 234 | toInt8(SilverlightVersion1) = -128 ? -127 : toInt8(SilverlightVersion1), |
| 235 | toInt8(SilverlightVersion2) = -128 ? -127 : toInt8(SilverlightVersion2), |
| 236 | toInt32(SilverlightVersion3) = -2147483648 ? -2147483647 : toInt32(SilverlightVersion3), |
| 237 | toInt16(SilverlightVersion4) = -32768 ? -32767 : toInt16(SilverlightVersion4), |
| 238 | toValidUTF8(toString(PageCharset)), |
| 239 | toInt32(CodeVersion) = -2147483648 ? -2147483647 : toInt32(CodeVersion), |
| 240 | toInt8(IsLink) = -128 ? -127 : toInt8(IsLink), |
| 241 | toInt8(IsDownload) = -128 ? -127 : toInt8(IsDownload), |
| 242 | toInt8(IsNotBounce) = -128 ? -127 : toInt8(IsNotBounce), |
| 243 | toInt64(FUniqID) = -9223372036854775808 ? -9223372036854775807 : toInt64(FUniqID), |
| 244 | toValidUTF8(toString(OriginalURL)), |
| 245 | toInt32(HID) = -2147483648 ? -2147483647 : toInt32(HID), |
| 246 | toInt8(IsOldCounter) = -128 ? -127 : toInt8(IsOldCounter), |
| 247 | toInt8(IsEvent) = -128 ? -127 : toInt8(IsEvent), |
| 248 | toInt8(IsParameter) = -128 ? -127 : toInt8(IsParameter), |
| 249 | toInt8(DontCountHits) = -128 ? -127 : toInt8(DontCountHits), |
| 250 | toInt8(WithHash) = -128 ? -127 : toInt8(WithHash), |
| 251 | toValidUTF8(toString(HitColor)), |
| 252 | LocalEventTime, |
| 253 | toInt8(Age) = -128 ? -127 : toInt8(Age), |
| 254 | toInt8(Sex) = -128 ? -127 : toInt8(Sex), |
| 255 | toInt8(Income) = -128 ? -127 : toInt8(Income), |
| 256 | toInt16(Interests) = -32768 ? -32767 : toInt16(Interests), |
| 257 | toInt8(Robotness) = -128 ? -127 : toInt8(Robotness), |
| 258 | toInt32(RemoteIP) = -2147483648 ? -2147483647 : toInt32(RemoteIP), |
| 259 | toInt32(WindowName) = -2147483648 ? -2147483647 : toInt32(WindowName), |
| 260 | toInt32(OpenerName) = -2147483648 ? -2147483647 : toInt32(OpenerName), |
| 261 | toInt16(HistoryLength) = -32768 ? -32767 : toInt16(HistoryLength), |
| 262 | toValidUTF8(toString(BrowserLanguage)), |
| 263 | toValidUTF8(toString(BrowserCountry)), |
| 264 | toValidUTF8(toString(SocialNetwork)), |
| 265 | toValidUTF8(toString(SocialAction)), |
| 266 | toInt16(HTTPError) = -32768 ? -32767 : toInt16(HTTPError), |
| 267 | toInt32(SendTiming) = -2147483648 ? -2147483647 : toInt32(SendTiming), |
| 268 | toInt32(DNSTiming) = -2147483648 ? -2147483647 : toInt32(DNSTiming), |
| 269 | toInt32(ConnectTiming) = -2147483648 ? -2147483647 : toInt32(ConnectTiming), |
| 270 | toInt32(ResponseStartTiming) = -2147483648 ? -2147483647 : toInt32(ResponseStartTiming), |
| 271 | toInt32(ResponseEndTiming) = -2147483648 ? -2147483647 : toInt32(ResponseEndTiming), |
| 272 | toInt32(FetchTiming) = -2147483648 ? -2147483647 : toInt32(FetchTiming), |
| 273 | toInt8(SocialSourceNetworkID) = -128 ? -127 : toInt8(SocialSourceNetworkID), |
| 274 | toValidUTF8(toString(SocialSourcePage)), |
| 275 | toInt64(ParamPrice) = -9223372036854775808 ? -9223372036854775807 : toInt64(ParamPrice), |
| 276 | toValidUTF8(toString(ParamOrderID)), |
| 277 | toValidUTF8(toString(ParamCurrency)), |
| 278 | toInt16(ParamCurrencyID) = -32768 ? -32767 : toInt16(ParamCurrencyID), |
| 279 | toValidUTF8(toString(OpenstatServiceName)), |
| 280 | toValidUTF8(toString(OpenstatCampaignID)), |
| 281 | toValidUTF8(toString(OpenstatAdID)), |
| 282 | toValidUTF8(toString(OpenstatSourceID)), |
| 283 | toValidUTF8(toString(UTMSource)), |
| 284 | toValidUTF8(toString(UTMMedium)), |
| 285 | toValidUTF8(toString(UTMCampaign)), |
| 286 | toValidUTF8(toString(UTMContent)), |
| 287 | toValidUTF8(toString(UTMTerm)), |
| 288 | toValidUTF8(toString(FromTag)), |
| 289 | toInt8(HasGCLID) = -128 ? -127 : toInt8(HasGCLID), |
| 290 | toInt64(RefererHash) = -9223372036854775808 ? -9223372036854775807 : toInt64(RefererHash), |
| 291 | toInt64(URLHash) = -9223372036854775808 ? -9223372036854775807 : toInt64(URLHash), |
| 292 | toInt32(CLID) = -2147483648 ? -2147483647 : toInt32(CLID) |
| 293 | FROM hits_100m_obfuscated |
| 294 | INTO OUTFILE '/home/milovidov/example_datasets/hits_100m_obfuscated_monetdb.tsv' |
| 295 | FORMAT TSV; |
| 296 | ``` |
| 297 | |
| 298 | Note that MonetDB does not support the most negative numbers like -128. And we have to convert them by adding one. |
| 299 | It makes impossible to store the values of 64bit identifiers in BIGINT. |
| 300 | Maybe it's a trick to optimize NULLs? |
| 301 | |
| 302 | Upload the data: |
| 303 | |
| 304 | ``` |
| 305 | $ mclient -u monetdb -d test |
| 306 | ``` |
| 307 | |
| 308 | Type password: monetdb |
| 309 | |
| 310 | ``` |
| 311 | COPY INTO hits FROM '/home/milovidov/example_datasets/hits_100m_obfuscated_monetdb.tsv' USING DELIMITERS '\t'; |
| 312 | ``` |
| 313 | |
| 314 | It takes 28 minutes 02 seconds on a server (Linux Ubuntu, Xeon E5-2560v2, 32 logical CPU, 128 GiB RAM, 8xHDD RAID-5, 40 TB). |
| 315 | It is roughly 60 000 rows per second. |
| 316 | |
| 317 | Validate the data: |
| 318 | |
| 319 | ``` |
| 320 | SELECT count(*) FROM hits; |
| 321 | ``` |
| 322 | |
| 323 | Create an index: |
| 324 | |
| 325 | ``` |
| 326 | CREATE INDEX hits_idx ON hits ("CounterID", "EventDate"); |
| 327 | ``` |
| 328 | |
| 329 | (it takes 5 seconds) |
| 330 | |
| 331 | Run the benchmark: |
| 332 | |
| 333 | ``` |
| 334 | ./benchmark.sh | tee log.txt |
| 335 | ``` |
| 336 | |
| 337 | You can find the log in `log.txt` file. |
| 338 | |
| 339 | Postprocess data: |
| 340 | |
| 341 | ``` |
| 342 | grep clk log.txt | tr -d '\r' | awk '{ if ($3 == "ms") { print $2 / 1000; } else if ($3 == "sec") { print $2 } else { print } }' |
| 343 | ``` |
| 344 | |
| 345 | Then replace values with "min" (minutes) timing manually and save to `tmp.txt`. |
| 346 | Then process to JSON format: |
| 347 | |
| 348 | ``` |
| 349 | awk '{ |
| 350 | if (i % 3 == 0) { a = $1 } |
| 351 | else if (i % 3 == 1) { b = $1 } |
| 352 | else if (i % 3 == 2) { c = $1; print "[" a ", " b ", " c "]," }; |
| 353 | ++i; }' < tmp.txt |
| 354 | ``` |
| 355 | |
| 356 | And paste to `/website/benchmark/dbms/results/005_monetdb.json` in the repository. |