The document discusses optimization techniques for deep learning frameworks on Intel CPUs and Fugaku aimed architectures. It introduces oneDNN, a performance library for deep learning operations on Intel CPUs. It discusses issues with C++ implementation, and how just-in-time assembly generation using Xbyak can address these issues by generating optimal code depending on parameters. It also introduces Xbyak_aarch64 for generating optimized code for Fugaku's Scalable Vector Extension instructions.
文献紹介:Deep Analysis of CNN-Based Spatio-Temporal Representations for Action Re...Toru Tamaki
?
Chun-Fu Richard Chen, Rameswar Panda, Kandan Ramakrishnan, Rogerio Feris, John Cohn, Aude Oliva, Quanfu Fan; Deep Analysis of CNN-Based Spatio-Temporal Representations for Action Recognition, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 6165-6175
https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Deep_Analysis_of_CNN-Based_Spatio-Temporal_Representations_for_Action_Recognition_CVPR_2021_paper.html
LLJVM is a library that translates LLVM bitcode to JVM bytecode. It was originally created to optimize Python UDFs in PySpark by compiling them to bitcode using Numba and then translating that to run on JVMs. However, LLJVM currently only supports a limited set of LLVM instructions and data types. It focuses on translating simple Numba-generated bitcode and providing runtime support functions. Translating more complex UDFs could improve PySpark performance significantly by avoiding serialization overhead and allowing whole-stage codegen.
The document introduces Apache Spark v2.3 and Hivemall-on-Spark v0.5.0. It discusses new features in Spark v2.3 including Structured Streaming, image support, and performance improvements for Pandas UDFs. It also provides an overview of Hivemall-on-Spark, which allows users to run Hivemall machine learning functions on Spark DataFrames/SQL and includes utilities for easier use. The author then demonstrates building a logistic regression model on sample data using Hivemall-on-Spark to classify documents. Current work is also discussed to further optimize feature selection by rewriting Spark plans before feature extraction.
This document discusses integrating XGBoost machine learning with Spark and DataFrames. It provides examples of using XGBoost in Spark to train models on distributed data and make predictions on streaming data in parallel. It also discusses future work, such as using Rabbit for parallel learning, adding support to more platforms like Windows, and integrating with Spark ML pipelines.
A x86-optimized rank&select dictionary for bit sequencesTakeshi Yamamuro
?
The document summarizes a technique for efficiently performing rank and select operations on bit sequences using succinct data structures. It describes splitting the bit sequence into blocks of logarithmic size and precomputing total count values (stored in arrays L and S) to allow rank and select to be performed in O(log N) time using only o(N) extra space, where N is the length of the bit sequence. This is done using a technique known as the "4 Russian methods". Performance test results show the optimized implementation outperforms existing libraries.
1) VAST-Tree is a new data structure that uses vector-based and compressed techniques to enable highly parallel tree traversal on modern hardware.
2) It classifies tree branches into different layers and applies different compression techniques like prefix/suffix bit truncation. This allows processing of multiple keys simultaneously using SIMD instructions.
3) Experiments on real Twitter data show that VAST-Tree achieves better compression ratios and throughput than existing techniques like FAST by dynamically compressing branch nodes while minimizing comparison errors.