Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
?
The document discusses Embulk, an open-source parallel bulk data loader that uses plugins. Embulk loads records from various sources ("A") to various targets ("B") using plugins for different source and target types. This makes the painful process of data integration more relaxed. Embulk executes in parallel, validates data, handles errors, behaves deterministically, and allows for idempotent retries of bulk loads.
https://2021.pycon.jp/time-table/?id=273396
Webアプリ開発とデータベースマイグレーションには密接な関係があり、Pythonでよく採用されるDjangoやSQLAlchemyには、DBのスキーマを変更するマイグレーション機能があります。一般的に、プログラムを実装するときはリポジトリでブランチを作りそれぞれのブランチで実装作業を進めます。Webアプリの開発でも同様ですが、各ブランチでDBスキーマを変更する場合には注意が必要です。例えば、複数のブランチで同じテーブルのカラムを追加して使いたい場合や、DBスキーマの変更が競合する場合は、ブランチのマージ時に競合してしまいます。多くの機能を並行開発したり、マージするまでの期間が長い場合には、このような競合が増えてしまいます。
このトークでは、Djangoを例に、データベースマイグレーションの仕組みから、実際の開発現場で発生したトラブルとその解決方法について紹介します。
Migration strategies for parallel development of web applications
Embulk, an open-source plugin-based parallel bulk data loaderSadayuki Furuhashi
?
The document discusses Embulk, an open-source parallel bulk data loader that uses plugins. Embulk loads records from various sources ("A") to various targets ("B") using plugins for different source and target types. This makes the painful process of data integration more relaxed. Embulk executes in parallel, validates data, handles errors, behaves deterministically, and allows for idempotent retries of bulk loads.
https://2021.pycon.jp/time-table/?id=273396
Webアプリ開発とデータベースマイグレーションには密接な関係があり、Pythonでよく採用されるDjangoやSQLAlchemyには、DBのスキーマを変更するマイグレーション機能があります。一般的に、プログラムを実装するときはリポジトリでブランチを作りそれぞれのブランチで実装作業を進めます。Webアプリの開発でも同様ですが、各ブランチでDBスキーマを変更する場合には注意が必要です。例えば、複数のブランチで同じテーブルのカラムを追加して使いたい場合や、DBスキーマの変更が競合する場合は、ブランチのマージ時に競合してしまいます。多くの機能を並行開発したり、マージするまでの期間が長い場合には、このような競合が増えてしまいます。
このトークでは、Djangoを例に、データベースマイグレーションの仕組みから、実際の開発現場で発生したトラブルとその解決方法について紹介します。
Migration strategies for parallel development of web applications
Online and offline handwritten chinese character recognition a comprehensive...Shuhei Iitsuka
?
The document presents a study on online and offline handwritten Chinese character recognition. A deep learning model is proposed that incorporates directional decomposition of characters into an 8-direction feature map. This model achieves state-of-the-art results on benchmark datasets while using less memory than comparison methods. The model can also be adapted through an unsupervised adaptation layer to new domains without requiring large labeled datasets.
Inferring win–lose product network from user behaviorShuhei Iitsuka
?
1) The document proposes a new method to analyze relationships between substitute products using user browsing and purchase behavior data from e-commerce sites. It examines which products are superior to others in attractiveness.
2) The method was tested on wedding venue data from a Japanese wedding planning site. It accurately identified competitive and win-lose relationships between venues based on correlations with user survey data.
3) The method also extracts keywords explaining why one product is superior by analyzing reviews from users who chose that product over others. This provided more accurate superiority factors than a simple baseline method.
Procedural modeling using autoencoder networksShuhei Iitsuka
?
1) The document proposes using autoencoder neural networks to reduce the dimensionality of procedural modeling parameters for 3D shapes. This creates a lower-dimensional latent space that organizes shapes based on similarity.
2) A user study showed that combining shape features with procedural parameters in the latent space improved the usability of the design system by generating a space organized by shape similarity.
3) The proposed method allows for an intuitive exploration of the design space compared to conventional procedural modeling interfaces but may limit the representational capacity of the design space.
Generating sentences from a continuous spaceShuhei Iitsuka
?
1) The document summarizes a research paper that proposed using a variational autoencoder (VAE) model to generate natural language sentences from a continuous latent space.
2) It showed the VAE model could outperform an RNN language model baseline on a missing word imputation task, suggesting the VAE better captures global sentence characteristics.
3) Analysis found the VAE learns topics and lengths of sentences, and can generate grammatical sentences when interpolating in the latent space, showing promise for text generation.
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...Shuhei Iitsuka
?
CUPED is a technique that uses pre-experiment data to reduce the variability of metrics in online controlled experiments. It works by adjusting the metrics based on their correlation with covariates from prior data to remove between-group variability. Experiments at Bing showed CUPED reduced metric variance by 50% and found significant results faster. The effectiveness depends on how well the covariates predict the metrics and that the covariates are measured before the experiment starts.
This document discusses how machine learning can be used by web developers, designers, and marketers in addition to product engineers. It provides examples of how machine learning can enable interactive data visualization for designers, A/B testing of website variations by developers to determine the best performing version, and mapping of products in multi-dimensional preference spaces based on user behavior logs to help marketers. The conclusion is that machine learning has applications beyond product engineering and can also benefit other roles in web development.
Asia Trend Map: Forecasting “Cool Japan” Content Popularity on Web DataShuhei Iitsuka
?
This document discusses a system called Asia Trend Map that forecasts the popularity of Japanese content such as anime, manga, and games in Asian countries over the next 6 months. The system collects data on these cultural products from Twitter, Wikipedia, and search engines in different Asian languages and uses this web data along with past Japanese sales data to train a model that can predict future trends. The results showed the system could improve its predictive accuracy by combining data sources and that Wikipedia data, especially page content attributes, was particularly helpful for predicting longer term trends.