CodeCommitsIssuesPull requestsActionsInsightsSecurity
d8d7c46711b2d765bda8dc6405d59ad08f1aceff

Branches

Tags

  • No tags available.
0Branches0Tags
Go to file
Add file
Code

Clone

HTTPS

Download ZIP

benchmark/monetdb/instruction.md

356lines · modecode

1Go to https://www.monetdb.org/
2
3Dowload now.
4Latest binary releases.
5Ubuntu & Debian.
6
7https://www.monetdb.org/downloads/deb/
8
9Go to the server where you want to install MonetDB.
10```
11$ sudo mcedit /etc/apt/sources.list.d/monetdb.list
12```
13Write:
14```
15deb https://dev.monetdb.org/downloads/deb/ bionic monetdb
16```
17
18```
19$ wget --output-document=- https://www.monetdb.org/downloads/MonetDB-GPG-KEY | sudo apt-key add -
20
21$ sudo apt update
22$ sudo apt install monetdb5-sql monetdb-client
23
24$ sudo systemctl enable monetdbd
25$ sudo systemctl start monetdbd
26$ sudo usermod -a -G monetdb $USER
27```
28
29Logout and login back to your server.
30
31Tutorial:
32https://www.monetdb.org/Documentation/UserGuide/Tutorial
33
34Creating the database:
35
36```
37$ sudo mkdir /opt/monetdb
38$ sudo chmod 777 /opt/monetdb
39$ monetdbd create /opt/monetdb
40
41$ monetdbd start /opt/monetdb
42cannot remove socket files
43```
44
45Now you have to stop MonetDB, copy the contents of `/var/monetdb5` to `/opt/monetdb` and replace the `/var/monetdb5` with symlink to `/opt/monetdb`. This is necessary, because I don't have free space in `/var` and creation of database in `/opt` did not succeed.
46
47Start MonetDB again.
48
49```
50$ sudo systemctl start monetdbd
51```
52
53```
54$ monetdb create test
55created database in maintenance mode: test
56
57$ monetdb release test
58taken database out of maintenance mode: test
59```
60
61Run client:
62```
63$ mclient -u monetdb -d test
64```
65
66Type password: monetdb
67
68```
69CREATE TABLE hits
70(
71 "WatchID" BIGINT,
72 "JavaEnable" TINYINT,
73 "Title" TEXT,
74 "GoodEvent" SMALLINT,
75 "EventTime" TIMESTAMP,
76 "EventDate" Date,
77 "CounterID" INTEGER,
78 "ClientIP" INTEGER,
79 "RegionID" INTEGER,
80 "UserID" BIGINT,
81 "CounterClass" TINYINT,
82 "OS" TINYINT,
83 "UserAgent" TINYINT,
84 "URL" TEXT,
85 "Referer" TEXT,
86 "Refresh" TINYINT,
87 "RefererCategoryID" SMALLINT,
88 "RefererRegionID" INTEGER,
89 "URLCategoryID" SMALLINT,
90 "URLRegionID" INTEGER,
91 "ResolutionWidth" SMALLINT,
92 "ResolutionHeight" SMALLINT,
93 "ResolutionDepth" TINYINT,
94 "FlashMajor" TINYINT,
95 "FlashMinor" TINYINT,
96 "FlashMinor2" TEXT,
97 "NetMajor" TINYINT,
98 "NetMinor" TINYINT,
99 "UserAgentMajor" SMALLINT,
100 "UserAgentMinor" TEXT,
101 "CookieEnable" TINYINT,
102 "JavascriptEnable" TINYINT,
103 "IsMobile" TINYINT,
104 "MobilePhone" TINYINT,
105 "MobilePhoneModel" TEXT,
106 "Params" TEXT,
107 "IPNetworkID" INTEGER,
108 "TraficSourceID" TINYINT,
109 "SearchEngineID" SMALLINT,
110 "SearchPhrase" TEXT,
111 "AdvEngineID" TINYINT,
112 "IsArtifical" TINYINT,
113 "WindowClientWidth" SMALLINT,
114 "WindowClientHeight" SMALLINT,
115 "ClientTimeZone" SMALLINT,
116 "ClientEventTime" TIMESTAMP,
117 "SilverlightVersion1" TINYINT,
118 "SilverlightVersion2" TINYINT,
119 "SilverlightVersion3" INTEGER,
120 "SilverlightVersion4" SMALLINT,
121 "PageCharset" TEXT,
122 "CodeVersion" INTEGER,
123 "IsLink" TINYINT,
124 "IsDownload" TINYINT,
125 "IsNotBounce" TINYINT,
126 "FUniqID" BIGINT,
127 "OriginalURL" TEXT,
128 "HID" INTEGER,
129 "IsOldCounter" TINYINT,
130 "IsEvent" TINYINT,
131 "IsParameter" TINYINT,
132 "DontCountHits" TINYINT,
133 "WithHash" TINYINT,
134 "HitColor" TEXT,
135 "LocalEventTime" TIMESTAMP,
136 "Age" TINYINT,
137 "Sex" TINYINT,
138 "Income" TINYINT,
139 "Interests" SMALLINT,
140 "Robotness" TINYINT,
141 "RemoteIP" INTEGER,
142 "WindowName" INTEGER,
143 "OpenerName" INTEGER,
144 "HistoryLength" SMALLINT,
145 "BrowserLanguage" TEXT,
146 "BrowserCountry" TEXT,
147 "SocialNetwork" TEXT,
148 "SocialAction" TEXT,
149 "HTTPError" SMALLINT,
150 "SendTiming" INTEGER,
151 "DNSTiming" INTEGER,
152 "ConnectTiming" INTEGER,
153 "ResponseStartTiming" INTEGER,
154 "ResponseEndTiming" INTEGER,
155 "FetchTiming" INTEGER,
156 "SocialSourceNetworkID" TINYINT,
157 "SocialSourcePage" TEXT,
158 "ParamPrice" BIGINT,
159 "ParamOrderID" TEXT,
160 "ParamCurrency" TEXT,
161 "ParamCurrencyID" SMALLINT,
162 "OpenstatServiceName" TEXT,
163 "OpenstatCampaignID" TEXT,
164 "OpenstatAdID" TEXT,
165 "OpenstatSourceID" TEXT,
166 "UTMSource" TEXT,
167 "UTMMedium" TEXT,
168 "UTMCampaign" TEXT,
169 "UTMContent" TEXT,
170 "UTMTerm" TEXT,
171 "FromTag" TEXT,
172 "HasGCLID" TINYINT,
173 "RefererHash" BIGINT,
174 "URLHash" BIGINT,
175 "CLID" INTEGER
176);
177```
178
179# How to prepare data
180
181Download the 100 million rows dataset from here and insert into ClickHouse:
182https://clickhouse.tech/docs/en/getting-started/example-datasets/metrica/
183
184Create the dataset from ClickHouse:
185
186```
187SELECT
188 toInt64(WatchID) = -9223372036854775808 ? -9223372036854775807 : toInt64(WatchID),
189 toInt8(JavaEnable) = -128 ? -127 : toInt8(JavaEnable),
190 toValidUTF8(toString(Title)),
191 toInt16(GoodEvent) = -32768 ? -32767 : toInt16(GoodEvent),
192 EventTime,
193 EventDate,
194 toInt32(CounterID) = -2147483648 ? -2147483647 : toInt32(CounterID),
195 toInt32(ClientIP) = -2147483648 ? -2147483647 : toInt32(ClientIP),
196 toInt32(RegionID) = -2147483648 ? -2147483647 : toInt32(RegionID),
197 toInt64(UserID) = -9223372036854775808 ? -9223372036854775807 : toInt64(UserID),
198 toInt8(CounterClass) = -128 ? -127 : toInt8(CounterClass),
199 toInt8(OS) = -128 ? -127 : toInt8(OS),
200 toInt8(UserAgent) = -128 ? -127 : toInt8(UserAgent),
201 toValidUTF8(toString(URL)),
202 toValidUTF8(toString(Referer)),
203 toInt8(Refresh) = -128 ? -127 : toInt8(Refresh),
204 toInt16(RefererCategoryID) = -32768 ? -32767 : toInt16(RefererCategoryID),
205 toInt32(RefererRegionID) = -2147483648 ? -2147483647 : toInt32(RefererRegionID),
206 toInt16(URLCategoryID) = -32768 ? -32767 : toInt16(URLCategoryID),
207 toInt32(URLRegionID) = -2147483648 ? -2147483647 : toInt32(URLRegionID),
208 toInt16(ResolutionWidth) = -32768 ? -32767 : toInt16(ResolutionWidth),
209 toInt16(ResolutionHeight) = -32768 ? -32767 : toInt16(ResolutionHeight),
210 toInt8(ResolutionDepth) = -128 ? -127 : toInt8(ResolutionDepth),
211 toInt8(FlashMajor) = -128 ? -127 : toInt8(FlashMajor),
212 toInt8(FlashMinor) = -128 ? -127 : toInt8(FlashMinor),
213 toValidUTF8(toString(FlashMinor2)),
214 toInt8(NetMajor) = -128 ? -127 : toInt8(NetMajor),
215 toInt8(NetMinor) = -128 ? -127 : toInt8(NetMinor),
216 toInt16(UserAgentMajor) = -32768 ? -32767 : toInt16(UserAgentMajor),
217 toValidUTF8(toString(UserAgentMinor)),
218 toInt8(CookieEnable) = -128 ? -127 : toInt8(CookieEnable),
219 toInt8(JavascriptEnable) = -128 ? -127 : toInt8(JavascriptEnable),
220 toInt8(IsMobile) = -128 ? -127 : toInt8(IsMobile),
221 toInt8(MobilePhone) = -128 ? -127 : toInt8(MobilePhone),
222 toValidUTF8(toString(MobilePhoneModel)),
223 toValidUTF8(toString(Params)),
224 toInt32(IPNetworkID) = -2147483648 ? -2147483647 : toInt32(IPNetworkID),
225 toInt8(TraficSourceID) = -128 ? -127 : toInt8(TraficSourceID),
226 toInt16(SearchEngineID) = -32768 ? -32767 : toInt16(SearchEngineID),
227 toValidUTF8(toString(SearchPhrase)),
228 toInt8(AdvEngineID) = -128 ? -127 : toInt8(AdvEngineID),
229 toInt8(IsArtifical) = -128 ? -127 : toInt8(IsArtifical),
230 toInt16(WindowClientWidth) = -32768 ? -32767 : toInt16(WindowClientWidth),
231 toInt16(WindowClientHeight) = -32768 ? -32767 : toInt16(WindowClientHeight),
232 toInt16(ClientTimeZone) = -32768 ? -32767 : toInt16(ClientTimeZone),
233 ClientEventTime,
234 toInt8(SilverlightVersion1) = -128 ? -127 : toInt8(SilverlightVersion1),
235 toInt8(SilverlightVersion2) = -128 ? -127 : toInt8(SilverlightVersion2),
236 toInt32(SilverlightVersion3) = -2147483648 ? -2147483647 : toInt32(SilverlightVersion3),
237 toInt16(SilverlightVersion4) = -32768 ? -32767 : toInt16(SilverlightVersion4),
238 toValidUTF8(toString(PageCharset)),
239 toInt32(CodeVersion) = -2147483648 ? -2147483647 : toInt32(CodeVersion),
240 toInt8(IsLink) = -128 ? -127 : toInt8(IsLink),
241 toInt8(IsDownload) = -128 ? -127 : toInt8(IsDownload),
242 toInt8(IsNotBounce) = -128 ? -127 : toInt8(IsNotBounce),
243 toInt64(FUniqID) = -9223372036854775808 ? -9223372036854775807 : toInt64(FUniqID),
244 toValidUTF8(toString(OriginalURL)),
245 toInt32(HID) = -2147483648 ? -2147483647 : toInt32(HID),
246 toInt8(IsOldCounter) = -128 ? -127 : toInt8(IsOldCounter),
247 toInt8(IsEvent) = -128 ? -127 : toInt8(IsEvent),
248 toInt8(IsParameter) = -128 ? -127 : toInt8(IsParameter),
249 toInt8(DontCountHits) = -128 ? -127 : toInt8(DontCountHits),
250 toInt8(WithHash) = -128 ? -127 : toInt8(WithHash),
251 toValidUTF8(toString(HitColor)),
252 LocalEventTime,
253 toInt8(Age) = -128 ? -127 : toInt8(Age),
254 toInt8(Sex) = -128 ? -127 : toInt8(Sex),
255 toInt8(Income) = -128 ? -127 : toInt8(Income),
256 toInt16(Interests) = -32768 ? -32767 : toInt16(Interests),
257 toInt8(Robotness) = -128 ? -127 : toInt8(Robotness),
258 toInt32(RemoteIP) = -2147483648 ? -2147483647 : toInt32(RemoteIP),
259 toInt32(WindowName) = -2147483648 ? -2147483647 : toInt32(WindowName),
260 toInt32(OpenerName) = -2147483648 ? -2147483647 : toInt32(OpenerName),
261 toInt16(HistoryLength) = -32768 ? -32767 : toInt16(HistoryLength),
262 toValidUTF8(toString(BrowserLanguage)),
263 toValidUTF8(toString(BrowserCountry)),
264 toValidUTF8(toString(SocialNetwork)),
265 toValidUTF8(toString(SocialAction)),
266 toInt16(HTTPError) = -32768 ? -32767 : toInt16(HTTPError),
267 toInt32(SendTiming) = -2147483648 ? -2147483647 : toInt32(SendTiming),
268 toInt32(DNSTiming) = -2147483648 ? -2147483647 : toInt32(DNSTiming),
269 toInt32(ConnectTiming) = -2147483648 ? -2147483647 : toInt32(ConnectTiming),
270 toInt32(ResponseStartTiming) = -2147483648 ? -2147483647 : toInt32(ResponseStartTiming),
271 toInt32(ResponseEndTiming) = -2147483648 ? -2147483647 : toInt32(ResponseEndTiming),
272 toInt32(FetchTiming) = -2147483648 ? -2147483647 : toInt32(FetchTiming),
273 toInt8(SocialSourceNetworkID) = -128 ? -127 : toInt8(SocialSourceNetworkID),
274 toValidUTF8(toString(SocialSourcePage)),
275 toInt64(ParamPrice) = -9223372036854775808 ? -9223372036854775807 : toInt64(ParamPrice),
276 toValidUTF8(toString(ParamOrderID)),
277 toValidUTF8(toString(ParamCurrency)),
278 toInt16(ParamCurrencyID) = -32768 ? -32767 : toInt16(ParamCurrencyID),
279 toValidUTF8(toString(OpenstatServiceName)),
280 toValidUTF8(toString(OpenstatCampaignID)),
281 toValidUTF8(toString(OpenstatAdID)),
282 toValidUTF8(toString(OpenstatSourceID)),
283 toValidUTF8(toString(UTMSource)),
284 toValidUTF8(toString(UTMMedium)),
285 toValidUTF8(toString(UTMCampaign)),
286 toValidUTF8(toString(UTMContent)),
287 toValidUTF8(toString(UTMTerm)),
288 toValidUTF8(toString(FromTag)),
289 toInt8(HasGCLID) = -128 ? -127 : toInt8(HasGCLID),
290 toInt64(RefererHash) = -9223372036854775808 ? -9223372036854775807 : toInt64(RefererHash),
291 toInt64(URLHash) = -9223372036854775808 ? -9223372036854775807 : toInt64(URLHash),
292 toInt32(CLID) = -2147483648 ? -2147483647 : toInt32(CLID)
293FROM hits_100m_obfuscated
294INTO OUTFILE '/home/milovidov/example_datasets/hits_100m_obfuscated_monetdb.tsv'
295FORMAT TSV;
296```
297
298Note that MonetDB does not support the most negative numbers like -128. And we have to convert them by adding one.
299It makes impossible to store the values of 64bit identifiers in BIGINT.
300Maybe it's a trick to optimize NULLs?
301
302Upload the data:
303
304```
305$ mclient -u monetdb -d test
306```
307
308Type password: monetdb
309
310```
311COPY INTO hits FROM '/home/milovidov/example_datasets/hits_100m_obfuscated_monetdb.tsv' USING DELIMITERS '\t';
312```
313
314It takes 28 minutes 02 seconds on a server (Linux Ubuntu, Xeon E5-2560v2, 32 logical CPU, 128 GiB RAM, 8xHDD RAID-5, 40 TB).
315It is roughly 60 000 rows per second.
316
317Validate the data:
318
319```
320SELECT count(*) FROM hits;
321```
322
323Create an index:
324
325```
326CREATE INDEX hits_idx ON hits ("CounterID", "EventDate");
327```
328
329(it takes 5 seconds)
330
331Run the benchmark:
332
333```
334./benchmark.sh | tee log.txt
335```
336
337You can find the log in `log.txt` file.
338
339Postprocess data:
340
341```
342grep clk log.txt | tr -d '\r' | awk '{ if ($3 == "ms") { print $2 / 1000; } else if ($3 == "sec") { print $2 } else { print } }'
343```
344
345Then replace values with "min" (minutes) timing manually and save to `tmp.txt`.
346Then process to JSON format:
347
348```
349awk '{
350 if (i % 3 == 0) { a = $1 }
351 else if (i % 3 == 1) { b = $1 }
352 else if (i % 3 == 2) { c = $1; print "[" a ", " b ", " c "]," };
353 ++i; }' < tmp.txt
354```
355
356And paste to `/website/benchmark/dbms/results/005_monetdb.json` in the repository.