ºÝºÝߣshows by User: lemire / http://www.slideshare.net/images/logo.gif ºÝºÝߣshows by User: lemire / Thu, 06 Apr 2023 19:05:58 GMT ºÝºÝߣShare feed for ºÝºÝߣshows by User: lemire Accurate and efficient software microbenchmarks /slideshow/accurate-and-efficient-software-microbenchmarks/257196212 performance-230406190558-08865b0f
Software is often improved incrementally. Each software optimization should be assessed with microbenchmarks. In a microbenchmark, we record performance measures such as elapsed time or instruction counts during specific tasks, often in idealized conditions. In principle, the process is easy: if the new code is faster, we adopt it. Unfortunately, there are many pitfalls, such as unrealistic statistical assumptions and poorly designed benchmarks. Abstractions like cloud computing add further challenges. We illustrate effective benchmarking practices with examples.]]>

Software is often improved incrementally. Each software optimization should be assessed with microbenchmarks. In a microbenchmark, we record performance measures such as elapsed time or instruction counts during specific tasks, often in idealized conditions. In principle, the process is easy: if the new code is faster, we adopt it. Unfortunately, there are many pitfalls, such as unrealistic statistical assumptions and poorly designed benchmarks. Abstractions like cloud computing add further challenges. We illustrate effective benchmarking practices with examples.]]>
Thu, 06 Apr 2023 19:05:58 GMT /slideshow/accurate-and-efficient-software-microbenchmarks/257196212 lemire@slideshare.net(lemire) Accurate and efficient software microbenchmarks lemire Software is often improved incrementally. Each software optimization should be assessed with microbenchmarks. In a microbenchmark, we record performance measures such as elapsed time or instruction counts during specific tasks, often in idealized conditions. In principle, the process is easy: if the new code is faster, we adopt it. Unfortunately, there are many pitfalls, such as unrealistic statistical assumptions and poorly designed benchmarks. Abstractions like cloud computing add further challenges. We illustrate effective benchmarking practices with examples. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/performance-230406190558-08865b0f-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Software is often improved incrementally. Each software optimization should be assessed with microbenchmarks. In a microbenchmark, we record performance measures such as elapsed time or instruction counts during specific tasks, often in idealized conditions. In principle, the process is easy: if the new code is faster, we adopt it. Unfortunately, there are many pitfalls, such as unrealistic statistical assumptions and poorly designed benchmarks. Abstractions like cloud computing add further challenges. We illustrate effective benchmarking practices with examples.
Accurate and efficient software microbenchmarks from Daniel Lemire
]]>
35 0 https://cdn.slidesharecdn.com/ss_thumbnails/performance-230406190558-08865b0f-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Fast indexes with roaring #gomtl-10 /slideshow/fast-indexes-with-roaring-gomtl10/193778670 go10roaring-191115023900
Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary). Roaring bitmaps are a standard indexing data structure. They are widely used in search and database engines. For example, Lucene, the search engine powering Wikipedia relies on Roaring. The Go library roaring implements Roaring bitmaps in Go. It is used in several popular systems such as InfluxDB, Pilosa and Bleve. This library is used in production in several systems, it is part of the Awesome Go collection. After presenting the library, we will cover some advanced Go topics such as the use of assembly language, unsafe mappings, and so forth.]]>

Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary). Roaring bitmaps are a standard indexing data structure. They are widely used in search and database engines. For example, Lucene, the search engine powering Wikipedia relies on Roaring. The Go library roaring implements Roaring bitmaps in Go. It is used in several popular systems such as InfluxDB, Pilosa and Bleve. This library is used in production in several systems, it is part of the Awesome Go collection. After presenting the library, we will cover some advanced Go topics such as the use of assembly language, unsafe mappings, and so forth.]]>
Fri, 15 Nov 2019 02:39:00 GMT /slideshow/fast-indexes-with-roaring-gomtl10/193778670 lemire@slideshare.net(lemire) Fast indexes with roaring #gomtl-10 lemire Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary). Roaring bitmaps are a standard indexing data structure. They are widely used in search and database engines. For example, Lucene, the search engine powering Wikipedia relies on Roaring. The Go library roaring implements Roaring bitmaps in Go. It is used in several popular systems such as InfluxDB, Pilosa and Bleve. This library is used in production in several systems, it is part of the Awesome Go collection. After presenting the library, we will cover some advanced Go topics such as the use of assembly language, unsafe mappings, and so forth. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/go10roaring-191115023900-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Presentation on Roaring bitmaps for the Go Montreal meetup (Go 10th anniversary). Roaring bitmaps are a standard indexing data structure. They are widely used in search and database engines. For example, Lucene, the search engine powering Wikipedia relies on Roaring. The Go library roaring implements Roaring bitmaps in Go. It is used in several popular systems such as InfluxDB, Pilosa and Bleve. This library is used in production in several systems, it is part of the Awesome Go collection. After presenting the library, we will cover some advanced Go topics such as the use of assembly language, unsafe mappings, and so forth.
Fast indexes with roaring #gomtl-10 from Daniel Lemire
]]>
423 0 https://cdn.slidesharecdn.com/ss_thumbnails/go10roaring-191115023900-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Parsing JSON Really Quickly: Lessons Learned /slideshow/parsing-json-really-quickly-lessons-learned/191842514 qcon2019-191109155258
Our disks and networks can load gigabytes of data per second; we feel strongly that our software should follow suit. Thus we wrote what might be the fastest JSON parser in the world, simdjson. It can parse typical JSON files at speeds of over 2 GB/s on single commodity Intel core with full validation; it is several times faster than conventional parsers. How did we go so fast? We started with the insight that we should make full use of the SIMD instructions available on commodity processors. These instructions are everywhere, from the ARM chip in your smartphone all to way to server processors. SIMD instructions work on wide registers (e.g., spanning 32 bytes): they are faster because they process more data using fewer instructions. To our knowledge, nobody had ever attempted to produce a full parser for something as complex as JSON by relying primarily on SIMD instructions. And many people were skeptical that a full parser could be done fruitfully with SIMD instructions. We had to develop interesting new strategies that are generally applicable. In the end, we learned several lessons. Maybe one of the most important lesson is the importance of a nearly obsessive focus on performance metrics. We constantly measure the impact of the choices we make.]]>

Our disks and networks can load gigabytes of data per second; we feel strongly that our software should follow suit. Thus we wrote what might be the fastest JSON parser in the world, simdjson. It can parse typical JSON files at speeds of over 2 GB/s on single commodity Intel core with full validation; it is several times faster than conventional parsers. How did we go so fast? We started with the insight that we should make full use of the SIMD instructions available on commodity processors. These instructions are everywhere, from the ARM chip in your smartphone all to way to server processors. SIMD instructions work on wide registers (e.g., spanning 32 bytes): they are faster because they process more data using fewer instructions. To our knowledge, nobody had ever attempted to produce a full parser for something as complex as JSON by relying primarily on SIMD instructions. And many people were skeptical that a full parser could be done fruitfully with SIMD instructions. We had to develop interesting new strategies that are generally applicable. In the end, we learned several lessons. Maybe one of the most important lesson is the importance of a nearly obsessive focus on performance metrics. We constantly measure the impact of the choices we make.]]>
Sat, 09 Nov 2019 15:52:58 GMT /slideshow/parsing-json-really-quickly-lessons-learned/191842514 lemire@slideshare.net(lemire) Parsing JSON Really Quickly: Lessons Learned lemire Our disks and networks can load gigabytes of data per second; we feel strongly that our software should follow suit. Thus we wrote what might be the fastest JSON parser in the world, simdjson. It can parse typical JSON files at speeds of over 2 GB/s on single commodity Intel core with full validation; it is several times faster than conventional parsers. How did we go so fast? We started with the insight that we should make full use of the SIMD instructions available on commodity processors. These instructions are everywhere, from the ARM chip in your smartphone all to way to server processors. SIMD instructions work on wide registers (e.g., spanning 32 bytes): they are faster because they process more data using fewer instructions. To our knowledge, nobody had ever attempted to produce a full parser for something as complex as JSON by relying primarily on SIMD instructions. And many people were skeptical that a full parser could be done fruitfully with SIMD instructions. We had to develop interesting new strategies that are generally applicable. In the end, we learned several lessons. Maybe one of the most important lesson is the importance of a nearly obsessive focus on performance metrics. We constantly measure the impact of the choices we make. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/qcon2019-191109155258-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Our disks and networks can load gigabytes of data per second; we feel strongly that our software should follow suit. Thus we wrote what might be the fastest JSON parser in the world, simdjson. It can parse typical JSON files at speeds of over 2 GB/s on single commodity Intel core with full validation; it is several times faster than conventional parsers. How did we go so fast? We started with the insight that we should make full use of the SIMD instructions available on commodity processors. These instructions are everywhere, from the ARM chip in your smartphone all to way to server processors. SIMD instructions work on wide registers (e.g., spanning 32 bytes): they are faster because they process more data using fewer instructions. To our knowledge, nobody had ever attempted to produce a full parser for something as complex as JSON by relying primarily on SIMD instructions. And many people were skeptical that a full parser could be done fruitfully with SIMD instructions. We had to develop interesting new strategies that are generally applicable. In the end, we learned several lessons. Maybe one of the most important lesson is the importance of a nearly obsessive focus on performance metrics. We constantly measure the impact of the choices we make.
Parsing JSON Really Quickly: Lessons Learned from Daniel Lemire
]]>
386 3 https://cdn.slidesharecdn.com/ss_thumbnails/qcon2019-191109155258-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Next Generation Indexes For Big Data Engineering (ODSC East 2018) /slideshow/next-generation-indexes-for-big-data-engineering-odsc-east-2018/94269560 odsceast2018-180418211050
Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Druid, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions. We will present ongoing and future work on how we can process data faster while supporting the diverse systems found in the cloud (with upcoming ARM processors) and under multiple programming languages (e.g., Java, C++, Go, Python). We seek to minimize shared resources (e.g., RAM) while exploiting algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. Our end goal is to process billions of records per second per core. The talk will be aimed at programmers who want to better understand the performance characteristics of current big-data systems as well as their evolution. The following specific topics will be addressed: 1. The various types of indexes and their performance characteristics and trade-offs: hashing, sorted arrays, bitsets and so forth. 2. Index and table compression techniques: binary packing, patched coding, dictionary coding, frame-of-reference.]]>

Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Druid, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions. We will present ongoing and future work on how we can process data faster while supporting the diverse systems found in the cloud (with upcoming ARM processors) and under multiple programming languages (e.g., Java, C++, Go, Python). We seek to minimize shared resources (e.g., RAM) while exploiting algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. Our end goal is to process billions of records per second per core. The talk will be aimed at programmers who want to better understand the performance characteristics of current big-data systems as well as their evolution. The following specific topics will be addressed: 1. The various types of indexes and their performance characteristics and trade-offs: hashing, sorted arrays, bitsets and so forth. 2. Index and table compression techniques: binary packing, patched coding, dictionary coding, frame-of-reference.]]>
Wed, 18 Apr 2018 21:10:49 GMT /slideshow/next-generation-indexes-for-big-data-engineering-odsc-east-2018/94269560 lemire@slideshare.net(lemire) Next Generation Indexes For Big Data Engineering (ODSC East 2018) lemire Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Druid, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions. We will present ongoing and future work on how we can process data faster while supporting the diverse systems found in the cloud (with upcoming ARM processors) and under multiple programming languages (e.g., Java, C++, Go, Python). We seek to minimize shared resources (e.g., RAM) while exploiting algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. Our end goal is to process billions of records per second per core. The talk will be aimed at programmers who want to better understand the performance characteristics of current big-data systems as well as their evolution. The following specific topics will be addressed: 1. The various types of indexes and their performance characteristics and trade-offs: hashing, sorted arrays, bitsets and so forth. 2. Index and table compression techniques: binary packing, patched coding, dictionary coding, frame-of-reference. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/odsceast2018-180418211050-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Maximizing performance in data engineering is a daunting challenge. We present some of our work on designing faster indexes, with a particular emphasis on compressed indexes. Some of our prior work includes (1) Roaring indexes which are part of multiple big-data systems such as Spark, Hive, Druid, Atlas, Pinot, Kylin, (2) EWAH indexes are part of Git (GitHub) and included in major Linux distributions. We will present ongoing and future work on how we can process data faster while supporting the diverse systems found in the cloud (with upcoming ARM processors) and under multiple programming languages (e.g., Java, C++, Go, Python). We seek to minimize shared resources (e.g., RAM) while exploiting algorithms designed for the single-instruction-multiple-data (SIMD) instructions available on commodity processors. Our end goal is to process billions of records per second per core. The talk will be aimed at programmers who want to better understand the performance characteristics of current big-data systems as well as their evolution. The following specific topics will be addressed: 1. The various types of indexes and their performance characteristics and trade-offs: hashing, sorted arrays, bitsets and so forth. 2. Index and table compression techniques: binary packing, patched coding, dictionary coding, frame-of-reference.
Next Generation Indexes For Big Data Engineering (ODSC East 2018) from Daniel Lemire
]]>
1239 14 https://cdn.slidesharecdn.com/ss_thumbnails/odsceast2018-180418211050-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Ingénierie de la performance au sein des mégadonnées https://fr.slideshare.net/slideshow/ingnierie-de-la-performance-au-sein-des-mgadonnes-94269448/94269448 megadata-180418210913
Les index logiciels accélèrent les applications en intelligence d'affaire, en apprentissage machine et en science des données. Ils déterminent souvent la performance des applications portant sur les mégadonnées. Les index efficaces améliorent non seulement la latence et le débit, mais aussi la consommation d'énergie. Plusieurs index font une utilisation parcimonieuse de la mémoire vive afin que les données critiques demeurent près du processeur. Il est aussi souhaitable de travailler directement sur les données compressées afin d'éviter une étape de décodage supplémentaire. (1) Nous nous intéressons aux index bitmap. Nous les trouvons dans une vaste gamme de systèmes : Oracle, Hive, Spark, Druid, Kylin, Lucene, Elastic, Git... Ils sont une composante de systèmes, tels que Wikipedia ou GitHub, dont dépendent des millions d'utilisateurs à tous les jours. Nous présenterons certains progrès récents ayant trait à l'optimisation des index bitmap, tels qu'ils sont utilisés au sein des systèmes actuels. Nous montrons par des exemples comment multiplier la performance de ces index dans certains cas sur les processeurs bénéficiant d'instructions SIMD (instruction unique, données multiples) avancées. (2) Nous ciblons aussi les listes d'entiers que l'on trouve au sein des arbres B+, dans les indexes inversés et les index bitmap compressés. Nous donnons un exemple récent de technique de compression (Stream VByte) d’entiers qui permet de décoder des milliards d’entiers compressés par seconde.]]>

Les index logiciels accélèrent les applications en intelligence d'affaire, en apprentissage machine et en science des données. Ils déterminent souvent la performance des applications portant sur les mégadonnées. Les index efficaces améliorent non seulement la latence et le débit, mais aussi la consommation d'énergie. Plusieurs index font une utilisation parcimonieuse de la mémoire vive afin que les données critiques demeurent près du processeur. Il est aussi souhaitable de travailler directement sur les données compressées afin d'éviter une étape de décodage supplémentaire. (1) Nous nous intéressons aux index bitmap. Nous les trouvons dans une vaste gamme de systèmes : Oracle, Hive, Spark, Druid, Kylin, Lucene, Elastic, Git... Ils sont une composante de systèmes, tels que Wikipedia ou GitHub, dont dépendent des millions d'utilisateurs à tous les jours. Nous présenterons certains progrès récents ayant trait à l'optimisation des index bitmap, tels qu'ils sont utilisés au sein des systèmes actuels. Nous montrons par des exemples comment multiplier la performance de ces index dans certains cas sur les processeurs bénéficiant d'instructions SIMD (instruction unique, données multiples) avancées. (2) Nous ciblons aussi les listes d'entiers que l'on trouve au sein des arbres B+, dans les indexes inversés et les index bitmap compressés. Nous donnons un exemple récent de technique de compression (Stream VByte) d’entiers qui permet de décoder des milliards d’entiers compressés par seconde.]]>
Wed, 18 Apr 2018 21:09:13 GMT https://fr.slideshare.net/slideshow/ingnierie-de-la-performance-au-sein-des-mgadonnes-94269448/94269448 lemire@slideshare.net(lemire) Ingénierie de la performance au sein des mégadonnées lemire Les index logiciels accélèrent les applications en intelligence d'affaire, en apprentissage machine et en science des données. Ils déterminent souvent la performance des applications portant sur les mégadonnées. Les index efficaces améliorent non seulement la latence et le débit, mais aussi la consommation d'énergie. Plusieurs index font une utilisation parcimonieuse de la mémoire vive afin que les données critiques demeurent près du processeur. Il est aussi souhaitable de travailler directement sur les données compressées afin d'éviter une étape de décodage supplémentaire. (1) Nous nous intéressons aux index bitmap. Nous les trouvons dans une vaste gamme de systèmes : Oracle, Hive, Spark, Druid, Kylin, Lucene, Elastic, Git... Ils sont une composante de systèmes, tels que Wikipedia ou GitHub, dont dépendent des millions d'utilisateurs à tous les jours. Nous présenterons certains progrès récents ayant trait à l'optimisation des index bitmap, tels qu'ils sont utilisés au sein des systèmes actuels. Nous montrons par des exemples comment multiplier la performance de ces index dans certains cas sur les processeurs bénéficiant d'instructions SIMD (instruction unique, données multiples) avancées. (2) Nous ciblons aussi les listes d'entiers que l'on trouve au sein des arbres B+, dans les indexes inversés et les index bitmap compressés. Nous donnons un exemple récent de technique de compression (Stream VByte) d’entiers qui permet de décoder des milliards d’entiers compressés par seconde. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/megadata-180418210913-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Les index logiciels accélèrent les applications en intelligence d&#39;affaire, en apprentissage machine et en science des données. Ils déterminent souvent la performance des applications portant sur les mégadonnées. Les index efficaces améliorent non seulement la latence et le débit, mais aussi la consommation d&#39;énergie. Plusieurs index font une utilisation parcimonieuse de la mémoire vive afin que les données critiques demeurent près du processeur. Il est aussi souhaitable de travailler directement sur les données compressées afin d&#39;éviter une étape de décodage supplémentaire. (1) Nous nous intéressons aux index bitmap. Nous les trouvons dans une vaste gamme de systèmes : Oracle, Hive, Spark, Druid, Kylin, Lucene, Elastic, Git... Ils sont une composante de systèmes, tels que Wikipedia ou GitHub, dont dépendent des millions d&#39;utilisateurs à tous les jours. Nous présenterons certains progrès récents ayant trait à l&#39;optimisation des index bitmap, tels qu&#39;ils sont utilisés au sein des systèmes actuels. Nous montrons par des exemples comment multiplier la performance de ces index dans certains cas sur les processeurs bénéficiant d&#39;instructions SIMD (instruction unique, données multiples) avancées. (2) Nous ciblons aussi les listes d&#39;entiers que l&#39;on trouve au sein des arbres B+, dans les indexes inversés et les index bitmap compressés. Nous donnons un exemple récent de technique de compression (Stream VByte) d’entiers qui permet de décoder des milliards d’entiers compressés par seconde.
from Daniel Lemire
]]>
724 80 https://cdn.slidesharecdn.com/ss_thumbnails/megadata-180418210913-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
SIMD Compression and the Intersection of Sorted Integers /slideshow/simd-compression-and-the-intersection-of-sorted-integers/79503024 prezi02-170906220953
Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our S4-BP128-D4 scheme uses as little as 0.7 CPU cycles per decoded integer while still providing state-of-the-art compression. However, if the subsequent processing of the integers is slow, the effort spent on optimizing decoding speed can be wasted. To show that it does not have to be so, we (1) vectorize and optimize the intersection of posting lists; (2) introduce the SIMD Galloping algorithm. We exploit the fact that one SIMD instruction can compare 4 pairs of integers at once. We experiment with two TREC text collections, GOV2 and ClueWeb09 (Category B), using logs from the TREC million-query track. We show that using only the SIMD instructions ubiquitous in all modern CPUs, our techniques for conjunctive queries can double the speed of a state-of-the-art approach.]]>

Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our S4-BP128-D4 scheme uses as little as 0.7 CPU cycles per decoded integer while still providing state-of-the-art compression. However, if the subsequent processing of the integers is slow, the effort spent on optimizing decoding speed can be wasted. To show that it does not have to be so, we (1) vectorize and optimize the intersection of posting lists; (2) introduce the SIMD Galloping algorithm. We exploit the fact that one SIMD instruction can compare 4 pairs of integers at once. We experiment with two TREC text collections, GOV2 and ClueWeb09 (Category B), using logs from the TREC million-query track. We show that using only the SIMD instructions ubiquitous in all modern CPUs, our techniques for conjunctive queries can double the speed of a state-of-the-art approach.]]>
Wed, 06 Sep 2017 22:09:53 GMT /slideshow/simd-compression-and-the-intersection-of-sorted-integers/79503024 lemire@slideshare.net(lemire) SIMD Compression and the Intersection of Sorted Integers lemire Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our S4-BP128-D4 scheme uses as little as 0.7 CPU cycles per decoded integer while still providing state-of-the-art compression. However, if the subsequent processing of the integers is slow, the effort spent on optimizing decoding speed can be wasted. To show that it does not have to be so, we (1) vectorize and optimize the intersection of posting lists; (2) introduce the SIMD Galloping algorithm. We exploit the fact that one SIMD instruction can compare 4 pairs of integers at once. We experiment with two TREC text collections, GOV2 and ClueWeb09 (Category B), using logs from the TREC million-query track. We show that using only the SIMD instructions ubiquitous in all modern CPUs, our techniques for conjunctive queries can double the speed of a state-of-the-art approach. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/prezi02-170906220953-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our S4-BP128-D4 scheme uses as little as 0.7 CPU cycles per decoded integer while still providing state-of-the-art compression. However, if the subsequent processing of the integers is slow, the effort spent on optimizing decoding speed can be wasted. To show that it does not have to be so, we (1) vectorize and optimize the intersection of posting lists; (2) introduce the SIMD Galloping algorithm. We exploit the fact that one SIMD instruction can compare 4 pairs of integers at once. We experiment with two TREC text collections, GOV2 and ClueWeb09 (Category B), using logs from the TREC million-query track. We show that using only the SIMD instructions ubiquitous in all modern CPUs, our techniques for conjunctive queries can double the speed of a state-of-the-art approach.
SIMD Compression and the Intersection of Sorted Integers from Daniel Lemire
]]>
1276 15 https://cdn.slidesharecdn.com/ss_thumbnails/prezi02-170906220953-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Decoding billions of integers per second through vectorization /slideshow/decoding-billions-of-integers-per-second-through-vectorization-79502981/79502981 prezi01-170906220710
In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.]]>

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.]]>
Wed, 06 Sep 2017 22:07:10 GMT /slideshow/decoding-billions-of-integers-per-second-through-vectorization-79502981/79502981 lemire@slideshare.net(lemire) Decoding billions of integers per second through vectorization lemire In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/prezi01-170906220710-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression ratio within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.
Decoding billions of integers per second through vectorization from Daniel Lemire
]]>
1113 12 https://cdn.slidesharecdn.com/ss_thumbnails/prezi01-170906220710-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compression /slideshow/logarithmic-discrete-wavelet-transform-for-highquality-medical-image-compression/73861365 ibraheemphd29mars2017-170329120652
Nowadays, medical image compression is an essential process in eHealth systems. Compressing medical images in high quality is a vital demand to avoid misdiagnosing medical exams by radiologists. WAAVES is a promising medical images compression algorithm based on the discrete wavelet transform (DWT) that achieves a high compression performance compared to the state of the art. The main aims of this work are to enhance image quality when compressing using WAAVES and to provide a high-speed DWT architecture for image compression on embedded systems. Regarding the quality improvement, the logarithmic number systems (LNS) was explored to be used as an alternative to the linear arithmetic in DWT computations. A new LNS library was developed and validated to realize the logarithmic DWT. In addition, a new quantization method called (LNS-Q) based on logarithmic arithmetic was proposed. A novel compression scheme (LNS-WAAVES) based on integrating the Hybrid-DWT and the LNS-Q method with WAAVES was developed. Hybrid-DWT combines the advantages of both the logarithmic and the linear domains leading to enhancement of the image quality and the compression ratio. The results showed that LNS-WAAVES is able to achieve an improvement in the quality by a percentage of 8% and up to 34% compared to WAAVES depending on the compression configuration parameters and the image modalities.]]>

Nowadays, medical image compression is an essential process in eHealth systems. Compressing medical images in high quality is a vital demand to avoid misdiagnosing medical exams by radiologists. WAAVES is a promising medical images compression algorithm based on the discrete wavelet transform (DWT) that achieves a high compression performance compared to the state of the art. The main aims of this work are to enhance image quality when compressing using WAAVES and to provide a high-speed DWT architecture for image compression on embedded systems. Regarding the quality improvement, the logarithmic number systems (LNS) was explored to be used as an alternative to the linear arithmetic in DWT computations. A new LNS library was developed and validated to realize the logarithmic DWT. In addition, a new quantization method called (LNS-Q) based on logarithmic arithmetic was proposed. A novel compression scheme (LNS-WAAVES) based on integrating the Hybrid-DWT and the LNS-Q method with WAAVES was developed. Hybrid-DWT combines the advantages of both the logarithmic and the linear domains leading to enhancement of the image quality and the compression ratio. The results showed that LNS-WAAVES is able to achieve an improvement in the quality by a percentage of 8% and up to 34% compared to WAAVES depending on the compression configuration parameters and the image modalities.]]>
Wed, 29 Mar 2017 12:06:52 GMT /slideshow/logarithmic-discrete-wavelet-transform-for-highquality-medical-image-compression/73861365 lemire@slideshare.net(lemire) Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compression lemire Nowadays, medical image compression is an essential process in eHealth systems. Compressing medical images in high quality is a vital demand to avoid misdiagnosing medical exams by radiologists. WAAVES is a promising medical images compression algorithm based on the discrete wavelet transform (DWT) that achieves a high compression performance compared to the state of the art. The main aims of this work are to enhance image quality when compressing using WAAVES and to provide a high-speed DWT architecture for image compression on embedded systems. Regarding the quality improvement, the logarithmic number systems (LNS) was explored to be used as an alternative to the linear arithmetic in DWT computations. A new LNS library was developed and validated to realize the logarithmic DWT. In addition, a new quantization method called (LNS-Q) based on logarithmic arithmetic was proposed. A novel compression scheme (LNS-WAAVES) based on integrating the Hybrid-DWT and the LNS-Q method with WAAVES was developed. Hybrid-DWT combines the advantages of both the logarithmic and the linear domains leading to enhancement of the image quality and the compression ratio. The results showed that LNS-WAAVES is able to achieve an improvement in the quality by a percentage of 8% and up to 34% compared to WAAVES depending on the compression configuration parameters and the image modalities. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ibraheemphd29mars2017-170329120652-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Nowadays, medical image compression is an essential process in eHealth systems. Compressing medical images in high quality is a vital demand to avoid misdiagnosing medical exams by radiologists. WAAVES is a promising medical images compression algorithm based on the discrete wavelet transform (DWT) that achieves a high compression performance compared to the state of the art. The main aims of this work are to enhance image quality when compressing using WAAVES and to provide a high-speed DWT architecture for image compression on embedded systems. Regarding the quality improvement, the logarithmic number systems (LNS) was explored to be used as an alternative to the linear arithmetic in DWT computations. A new LNS library was developed and validated to realize the logarithmic DWT. In addition, a new quantization method called (LNS-Q) based on logarithmic arithmetic was proposed. A novel compression scheme (LNS-WAAVES) based on integrating the Hybrid-DWT and the LNS-Q method with WAAVES was developed. Hybrid-DWT combines the advantages of both the logarithmic and the linear domains leading to enhancement of the image quality and the compression ratio. The results showed that LNS-WAAVES is able to achieve an improvement in the quality by a percentage of 8% and up to 34% compared to WAAVES depending on the compression configuration parameters and the image modalities.
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compression from Daniel Lemire
]]>
784 11 https://cdn.slidesharecdn.com/ss_thumbnails/ibraheemphd29mars2017-170329120652-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Engineering fast indexes (Deepdive) /slideshow/engineering-fast-indexes-deepdive/71885510 deepdive-170207223013
Contemporary computing hardware offers massive new performance opportunities. Yet high-performance programming remains a daunting challenge.]]>

Contemporary computing hardware offers massive new performance opportunities. Yet high-performance programming remains a daunting challenge.]]>
Tue, 07 Feb 2017 22:30:12 GMT /slideshow/engineering-fast-indexes-deepdive/71885510 lemire@slideshare.net(lemire) Engineering fast indexes (Deepdive) lemire Contemporary computing hardware offers massive new performance opportunities. Yet high-performance programming remains a daunting challenge. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/deepdive-170207223013-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Contemporary computing hardware offers massive new performance opportunities. Yet high-performance programming remains a daunting challenge.
Engineering fast indexes (Deepdive) from Daniel Lemire
]]>
7369 26 https://cdn.slidesharecdn.com/ss_thumbnails/deepdive-170207223013-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Engineering fast indexes /slideshow/engineering-fast-indexes/71885458 fastindexes-170207222837
Contemporary computing hardware offers massive new performance opportunities. Yet high-performance programming remains a daunting challenge. We present some of the lessons learned while designing faster indexes, with a particular emphasis on compressed bitmap indexes. Compressed bitmap indexes accelerate queries in popular systems such as Apache Spark, Git, Elastic, Druid and Apache Kylin.]]>

Contemporary computing hardware offers massive new performance opportunities. Yet high-performance programming remains a daunting challenge. We present some of the lessons learned while designing faster indexes, with a particular emphasis on compressed bitmap indexes. Compressed bitmap indexes accelerate queries in popular systems such as Apache Spark, Git, Elastic, Druid and Apache Kylin.]]>
Tue, 07 Feb 2017 22:28:37 GMT /slideshow/engineering-fast-indexes/71885458 lemire@slideshare.net(lemire) Engineering fast indexes lemire Contemporary computing hardware offers massive new performance opportunities. Yet high-performance programming remains a daunting challenge. We present some of the lessons learned while designing faster indexes, with a particular emphasis on compressed bitmap indexes. Compressed bitmap indexes accelerate queries in popular systems such as Apache Spark, Git, Elastic, Druid and Apache Kylin. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/fastindexes-170207222837-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Contemporary computing hardware offers massive new performance opportunities. Yet high-performance programming remains a daunting challenge. We present some of the lessons learned while designing faster indexes, with a particular emphasis on compressed bitmap indexes. Compressed bitmap indexes accelerate queries in popular systems such as Apache Spark, Git, Elastic, Druid and Apache Kylin.
Engineering fast indexes from Daniel Lemire
]]>
1988 15 https://cdn.slidesharecdn.com/ss_thumbnails/fastindexes-170207222837-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
MaskedVByte: SIMD-accelerated VByte /slideshow/maskedvbyte-simdaccelerated-vbyte/59235325 vectorizedvbytedecoding-160308034357
We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encountered. VByte decoding can be a performance bottleneck especially when the unpredictable lengths of the encoded integers cause frequent branch mispredictions. Previous attempts to accelerate VByte decoding using SIMD vector instructions have been disappointing, prodding search engines such as Google to use more complicated but faster-to-decode formats for performance-critical code. Our decoder (Masked VByte) is 2 to 4 times faster than a conventional scalar VByte decoder, making the format once again competitive with regard to speed. Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/pdf/1503.07387.pdf]]>

We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encountered. VByte decoding can be a performance bottleneck especially when the unpredictable lengths of the encoded integers cause frequent branch mispredictions. Previous attempts to accelerate VByte decoding using SIMD vector instructions have been disappointing, prodding search engines such as Google to use more complicated but faster-to-decode formats for performance-critical code. Our decoder (Masked VByte) is 2 to 4 times faster than a conventional scalar VByte decoder, making the format once again competitive with regard to speed. Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/pdf/1503.07387.pdf]]>
Tue, 08 Mar 2016 03:43:57 GMT /slideshow/maskedvbyte-simdaccelerated-vbyte/59235325 lemire@slideshare.net(lemire) MaskedVByte: SIMD-accelerated VByte lemire We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encountered. VByte decoding can be a performance bottleneck especially when the unpredictable lengths of the encoded integers cause frequent branch mispredictions. Previous attempts to accelerate VByte decoding using SIMD vector instructions have been disappointing, prodding search engines such as Google to use more complicated but faster-to-decode formats for performance-critical code. Our decoder (Masked VByte) is 2 to 4 times faster than a conventional scalar VByte decoder, making the format once again competitive with regard to speed. Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/pdf/1503.07387.pdf <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/vectorizedvbytedecoding-160308034357-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a continuation flag. This flag is set to 1 for all bytes except the last, and the decoding of each integer is complete when a byte with a high bit of 0 is encountered. VByte decoding can be a performance bottleneck especially when the unpredictable lengths of the encoded integers cause frequent branch mispredictions. Previous attempts to accelerate VByte decoding using SIMD vector instructions have been disappointing, prodding search engines such as Google to use more complicated but faster-to-decode formats for performance-critical code. Our decoder (Masked VByte) is 2 to 4 times faster than a conventional scalar VByte decoder, making the format once again competitive with regard to speed. Jeff Plaisance, Nathan Kurz, Daniel Lemire, Vectorized VByte Decoding, International Symposium on Web Algorithms 2015, 2015. http://arxiv.org/pdf/1503.07387.pdf
MaskedVByte: SIMD-accelerated VByte from Daniel Lemire
]]>
1685 12 https://cdn.slidesharecdn.com/ss_thumbnails/vectorizedvbytedecoding-160308034357-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Roaring Bitmaps (January 2016) /lemire/roaring-bitmaps-january-2016 roaringjan2016-160120011249
Better bitmap performance with Roaring bitmaps Bitmaps are used to implement fast set operations in software. They are frequently found in databases and search engines. Without compression, bitmaps scale poorly, so they are often compressed. Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). For example, Oracle relies on BBC bitmap compression while the version control system Git and Apache Hive rely on EWAH compression. We can get superior performance with a hybrid compression technique that uses both uncompressed bitmaps and packed arrays inside a two-level tree. An instance of this technique, Roaring, has been adopted by several production platforms (e.g., Apache Lucene/Solr/Elastic, Apache Spark, eBay's Apache Kylin and Metamarkets' Druid). Overall, our implementation of Roaring can be several times faster (up to two orders of magnitude) than the implementations of traditional RLE-based alternatives (WAH, Concise, EWAH) while compressing better. We review the design choices and optimizations that make these good results possible.]]>

Better bitmap performance with Roaring bitmaps Bitmaps are used to implement fast set operations in software. They are frequently found in databases and search engines. Without compression, bitmaps scale poorly, so they are often compressed. Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). For example, Oracle relies on BBC bitmap compression while the version control system Git and Apache Hive rely on EWAH compression. We can get superior performance with a hybrid compression technique that uses both uncompressed bitmaps and packed arrays inside a two-level tree. An instance of this technique, Roaring, has been adopted by several production platforms (e.g., Apache Lucene/Solr/Elastic, Apache Spark, eBay's Apache Kylin and Metamarkets' Druid). Overall, our implementation of Roaring can be several times faster (up to two orders of magnitude) than the implementations of traditional RLE-based alternatives (WAH, Concise, EWAH) while compressing better. We review the design choices and optimizations that make these good results possible.]]>
Wed, 20 Jan 2016 01:12:49 GMT /lemire/roaring-bitmaps-january-2016 lemire@slideshare.net(lemire) Roaring Bitmaps (January 2016) lemire Better bitmap performance with Roaring bitmaps Bitmaps are used to implement fast set operations in software. They are frequently found in databases and search engines. Without compression, bitmaps scale poorly, so they are often compressed. Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). For example, Oracle relies on BBC bitmap compression while the version control system Git and Apache Hive rely on EWAH compression. We can get superior performance with a hybrid compression technique that uses both uncompressed bitmaps and packed arrays inside a two-level tree. An instance of this technique, Roaring, has been adopted by several production platforms (e.g., Apache Lucene/Solr/Elastic, Apache Spark, eBay's Apache Kylin and Metamarkets' Druid). Overall, our implementation of Roaring can be several times faster (up to two orders of magnitude) than the implementations of traditional RLE-based alternatives (WAH, Concise, EWAH) while compressing better. We review the design choices and optimizations that make these good results possible. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/roaringjan2016-160120011249-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Better bitmap performance with Roaring bitmaps Bitmaps are used to implement fast set operations in software. They are frequently found in databases and search engines. Without compression, bitmaps scale poorly, so they are often compressed. Many bitmap compression techniques have been proposed, almost all relying primarily on run-length encoding (RLE). For example, Oracle relies on BBC bitmap compression while the version control system Git and Apache Hive rely on EWAH compression. We can get superior performance with a hybrid compression technique that uses both uncompressed bitmaps and packed arrays inside a two-level tree. An instance of this technique, Roaring, has been adopted by several production platforms (e.g., Apache Lucene/Solr/Elastic, Apache Spark, eBay&#39;s Apache Kylin and Metamarkets&#39; Druid). Overall, our implementation of Roaring can be several times faster (up to two orders of magnitude) than the implementations of traditional RLE-based alternatives (WAH, Concise, EWAH) while compressing better. We review the design choices and optimizations that make these good results possible.
Roaring Bitmaps (January 2016) from Daniel Lemire
]]>
3423 12 https://cdn.slidesharecdn.com/ss_thumbnails/roaringjan2016-160120011249-thumbnail.jpg?width=120&height=120&fit=bounds presentation 000000 http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Roaring Bitmap : June 2015 report /slideshow/roaringprezi-49478534/49478534 tkxng3yytwuimrrbrmtk-signature-fb889ef204d4c726466935d31eed440b60d975f67aacfadf2cc8ca6be78cfb33-poli-150616211543-lva1-app6892
Research report on the development of the Roaring bitmap index.]]>

Research report on the development of the Roaring bitmap index.]]>
Tue, 16 Jun 2015 21:15:43 GMT /slideshow/roaringprezi-49478534/49478534 lemire@slideshare.net(lemire) Roaring Bitmap : June 2015 report lemire Research report on the development of the Roaring bitmap index. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/tkxng3yytwuimrrbrmtk-signature-fb889ef204d4c726466935d31eed440b60d975f67aacfadf2cc8ca6be78cfb33-poli-150616211543-lva1-app6892-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Research report on the development of the Roaring bitmap index.
Roaring Bitmap : June 2015 report from Daniel Lemire
]]>
2923 9 https://cdn.slidesharecdn.com/ss_thumbnails/tkxng3yytwuimrrbrmtk-signature-fb889ef204d4c726466935d31eed440b60d975f67aacfadf2cc8ca6be78cfb33-poli-150616211543-lva1-app6892-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
La vectorisation des algorithmes de compression https://fr.slideshare.net/slideshow/la-vectorisation-des-algorithmes-de-compression/41375604 lemirelatece-141110160723-conversion-gate02
Depuis la mise en marché du Pentium 4, nos processeurs bénéficient d'instructions vectorielles. En tenant compte explicitement de ces instructions dans la conception de nos algorithmes, nous pouvons grandement accélérer les calculs. À titre d'exemple, considérons la compression des listes d'entiers telle qu'elle s'effectue au sein de la plupart des moteurs de recherche ou des bases de données. En cette matière, nous utilisons souvent encore des algorithmes développés dans les années 70. Nous expliquerons comment on peut faire beaucoup mieux en ce qui a trait à la vitesse en exploitant les instructions vectorielles.]]>

Depuis la mise en marché du Pentium 4, nos processeurs bénéficient d'instructions vectorielles. En tenant compte explicitement de ces instructions dans la conception de nos algorithmes, nous pouvons grandement accélérer les calculs. À titre d'exemple, considérons la compression des listes d'entiers telle qu'elle s'effectue au sein de la plupart des moteurs de recherche ou des bases de données. En cette matière, nous utilisons souvent encore des algorithmes développés dans les années 70. Nous expliquerons comment on peut faire beaucoup mieux en ce qui a trait à la vitesse en exploitant les instructions vectorielles.]]>
Mon, 10 Nov 2014 16:07:22 GMT https://fr.slideshare.net/slideshow/la-vectorisation-des-algorithmes-de-compression/41375604 lemire@slideshare.net(lemire) La vectorisation des algorithmes de compression lemire Depuis la mise en marché du Pentium 4, nos processeurs bénéficient d'instructions vectorielles. En tenant compte explicitement de ces instructions dans la conception de nos algorithmes, nous pouvons grandement accélérer les calculs. À titre d'exemple, considérons la compression des listes d'entiers telle qu'elle s'effectue au sein de la plupart des moteurs de recherche ou des bases de données. En cette matière, nous utilisons souvent encore des algorithmes développés dans les années 70. Nous expliquerons comment on peut faire beaucoup mieux en ce qui a trait à la vitesse en exploitant les instructions vectorielles. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/lemirelatece-141110160723-conversion-gate02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Depuis la mise en marché du Pentium 4, nos processeurs bénéficient d&#39;instructions vectorielles. En tenant compte explicitement de ces instructions dans la conception de nos algorithmes, nous pouvons grandement accélérer les calculs. À titre d&#39;exemple, considérons la compression des listes d&#39;entiers telle qu&#39;elle s&#39;effectue au sein de la plupart des moteurs de recherche ou des bases de données. En cette matière, nous utilisons souvent encore des algorithmes développés dans les années 70. Nous expliquerons comment on peut faire beaucoup mieux en ce qui a trait à la vitesse en exploitant les instructions vectorielles.
from Daniel Lemire
]]>
716 5 https://cdn.slidesharecdn.com/ss_thumbnails/lemirelatece-141110160723-conversion-gate02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
OLAP and more /slideshow/olap-and-more/31643396 yourprezi-140225175743-phpapp01
]]>

]]>
Tue, 25 Feb 2014 17:57:43 GMT /slideshow/olap-and-more/31643396 lemire@slideshare.net(lemire) OLAP and more lemire <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/yourprezi-140225175743-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br>
OLAP and more from Daniel Lemire
]]>
1010 13 https://cdn.slidesharecdn.com/ss_thumbnails/yourprezi-140225175743-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Decoding billions of integers per second through vectorization /lemire/decoding-billions-of-integers-per-second-through-vectorization prezilicef2012-121101152347-phpapp02
In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression rate within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.]]>

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression rate within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.]]>
Thu, 01 Nov 2012 15:23:45 GMT /lemire/decoding-billions-of-integers-per-second-through-vectorization lemire@slideshare.net(lemire) Decoding billions of integers per second through vectorization lemire In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression rate within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/prezilicef2012-121101152347-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time. Therefore, substantial effort has been made to reduce costs associated with compression and decompression. In particular, researchers have exploited the superscalar nature of modern processors and SIMD instructions. Nevertheless, we introduce a novel vectorized scheme called SIMD-BP128 that improves over previously proposed vectorized approaches. It is nearly twice as fast as the previously fastest schemes on desktop processors (varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2 bits per integer. For even better compression, we propose another new vectorized scheme (SIMD-FastPFOR) that has a compression rate within 10% of a state-of-the-art scheme (Simple-8b) while being two times faster during decoding.
Decoding billions of integers per second through vectorization from Daniel Lemire
]]>
1069 5 https://cdn.slidesharecdn.com/ss_thumbnails/prezilicef2012-121101152347-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Extracting, Transforming and Archiving Scientific Data /slideshow/extracting-transforming-and-archiving-scientific-data/9287211 yourprezi-110916133800-phpapp02
It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research. ]]>

It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research. ]]>
Fri, 16 Sep 2011 13:37:59 GMT /slideshow/extracting-transforming-and-archiving-scientific-data/9287211 lemire@slideshare.net(lemire) Extracting, Transforming and Archiving Scientific Data lemire It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/yourprezi-110916133800-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> It is becoming common to archive research datasets that are not only large but also numerous. In addition, their corresponding metadata and the software required to analyse or display them need to be archived. Yet the manual curation of research data can be difficult and expensive, particularly in very large digital repositories, hence the importance of models and tools for automating digital curation tasks. The automation of these tasks faces three major challenges: (1) research data and data sources are highly heterogeneous, (2) future research needs are difficult to anticipate, (3) data is hard to index. To address these problems, we propose the Extract, Transform and Archive (ETA) model for managing and mechanizing the curation of research data. Specifically, we propose a scalable strategy for addressing the research-data problem, ranging from the extraction of legacy data to its long-term storage. We review some existing solutions and propose novel avenues of research.
Extracting, Transforming and Archiving Scientific Data from Daniel Lemire
]]>
615 6 https://cdn.slidesharecdn.com/ss_thumbnails/yourprezi-110916133800-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Innovation without permission: from Codd to NoSQL /slideshow/innovation-without-permission-from-codd-to-nosql/7553233 talkapril2011lemire-110407145454-phpapp02
Practitioners often fail to apply textbook database design principles. We observe both a perversion of the relational model and a growth of less formal alternatives. Overall, there is an opposition between the analytic thought that prevailed when many data modeling techniques were initiated, and the pragmatism which now dominates among practitioners. There are at least two recent trends supporting this rejection of traditional models: (1) the rise of the sophisticated user, most notably in social media is challenge to the rationalist view, as it blurs the distinction between design and operation, (2) in the new technological landscape where there are billions of interconnected computers worldwide, simple concepts like consistency sometimes become prohibitively expensive. Overall, for a wide range of information systems, design and operation are becoming integrated in the spirit of pragmatism. Thus, we are left with design methodologies which embrace fast and continual iterations and and exploratory testing. These methodologies allow innovation without permission in that the right to design new features is no longer so closely guarded. Fo]]>

Practitioners often fail to apply textbook database design principles. We observe both a perversion of the relational model and a growth of less formal alternatives. Overall, there is an opposition between the analytic thought that prevailed when many data modeling techniques were initiated, and the pragmatism which now dominates among practitioners. There are at least two recent trends supporting this rejection of traditional models: (1) the rise of the sophisticated user, most notably in social media is challenge to the rationalist view, as it blurs the distinction between design and operation, (2) in the new technological landscape where there are billions of interconnected computers worldwide, simple concepts like consistency sometimes become prohibitively expensive. Overall, for a wide range of information systems, design and operation are becoming integrated in the spirit of pragmatism. Thus, we are left with design methodologies which embrace fast and continual iterations and and exploratory testing. These methodologies allow innovation without permission in that the right to design new features is no longer so closely guarded. Fo]]>
Thu, 07 Apr 2011 14:54:52 GMT /slideshow/innovation-without-permission-from-codd-to-nosql/7553233 lemire@slideshare.net(lemire) Innovation without permission: from Codd to NoSQL lemire Practitioners often fail to apply textbook database design principles. We observe both a perversion of the relational model and a growth of less formal alternatives. Overall, there is an opposition between the analytic thought that prevailed when many data modeling techniques were initiated, and the pragmatism which now dominates among practitioners. There are at least two recent trends supporting this rejection of traditional models: (1) the rise of the sophisticated user, most notably in social media is challenge to the rationalist view, as it blurs the distinction between design and operation, (2) in the new technological landscape where there are billions of interconnected computers worldwide, simple concepts like consistency sometimes become prohibitively expensive. Overall, for a wide range of information systems, design and operation are becoming integrated in the spirit of pragmatism. Thus, we are left with design methodologies which embrace fast and continual iterations and and exploratory testing. These methodologies allow innovation without permission in that the right to design new features is no longer so closely guarded. Fo <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/talkapril2011lemire-110407145454-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Practitioners often fail to apply textbook database design principles. We observe both a perversion of the relational model and a growth of less formal alternatives. Overall, there is an opposition between the analytic thought that prevailed when many data modeling techniques were initiated, and the pragmatism which now dominates among practitioners. There are at least two recent trends supporting this rejection of traditional models: (1) the rise of the sophisticated user, most notably in social media is challenge to the rationalist view, as it blurs the distinction between design and operation, (2) in the new technological landscape where there are billions of interconnected computers worldwide, simple concepts like consistency sometimes become prohibitively expensive. Overall, for a wide range of information systems, design and operation are becoming integrated in the spirit of pragmatism. Thus, we are left with design methodologies which embrace fast and continual iterations and and exploratory testing. These methodologies allow innovation without permission in that the right to design new features is no longer so closely guarded. Fo
Innovation without permission: from Codd to NoSQL from Daniel Lemire
]]>
701 7 https://cdn.slidesharecdn.com/ss_thumbnails/talkapril2011lemire-110407145454-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Write good papers /slideshow/write-good-papers/3565727 goodwriting-100326135618-phpapp01
How to write good research papers. This talk covers everything from selecting a good title, writing a good abstract, crafting good figures, and so on.]]>

How to write good research papers. This talk covers everything from selecting a good title, writing a good abstract, crafting good figures, and so on.]]>
Fri, 26 Mar 2010 13:56:16 GMT /slideshow/write-good-papers/3565727 lemire@slideshare.net(lemire) Write good papers lemire How to write good research papers. This talk covers everything from selecting a good title, writing a good abstract, crafting good figures, and so on. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/goodwriting-100326135618-phpapp01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> How to write good research papers. This talk covers everything from selecting a good title, writing a good abstract, crafting good figures, and so on.
Write good papers from Daniel Lemire
]]>
34276 105 https://cdn.slidesharecdn.com/ss_thumbnails/goodwriting-100326135618-phpapp01-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Faster Column-Oriented Indexes /slideshow/faster-columnoriented-indexes/3127104 uqam2010talkdraft-100210143423-phpapp02
Recent research results in optimizing column-oriented indexes for faster data warehousing. This talks aims to answer the following question: when is sorting the table a sufficiently good optimization?]]>

Recent research results in optimizing column-oriented indexes for faster data warehousing. This talks aims to answer the following question: when is sorting the table a sufficiently good optimization?]]>
Wed, 10 Feb 2010 14:34:15 GMT /slideshow/faster-columnoriented-indexes/3127104 lemire@slideshare.net(lemire) Faster Column-Oriented Indexes lemire Recent research results in optimizing column-oriented indexes for faster data warehousing. This talks aims to answer the following question: when is sorting the table a sufficiently good optimization? <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/uqam2010talkdraft-100210143423-phpapp02-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Recent research results in optimizing column-oriented indexes for faster data warehousing. This talks aims to answer the following question: when is sorting the table a sufficiently good optimization?
Faster Column-Oriented Indexes from Daniel Lemire
]]>
1979 4 https://cdn.slidesharecdn.com/ss_thumbnails/uqam2010talkdraft-100210143423-phpapp02-thumbnail.jpg?width=120&height=120&fit=bounds presentation White http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-lemire-48x48.jpg?cb=1680807925 I've been an entrepreneur, a government researcher and a university professor. I once designed, built and sold software for a living. I work on software performance, data indexing and data science. In recent years, I have focused on fast compression, indexing and hashing techniques. My research is supported by substantial open source programming. lemire.me/ https://cdn.slidesharecdn.com/ss_thumbnails/performance-230406190558-08865b0f-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/accurate-and-efficient-software-microbenchmarks/257196212 Accurate and efficient... https://cdn.slidesharecdn.com/ss_thumbnails/go10roaring-191115023900-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/fast-indexes-with-roaring-gomtl10/193778670 Fast indexes with roar... https://cdn.slidesharecdn.com/ss_thumbnails/qcon2019-191109155258-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/parsing-json-really-quickly-lessons-learned/191842514 Parsing JSON Really Qu...