際際滷shows by User: xuhong / http://www.slideshare.net/images/logo.gif 際際滷shows by User: xuhong / Tue, 23 Apr 2019 19:58:07 GMT 際際滷Share feed for 際際滷shows by User: xuhong Avro2 tf: a data processing engine for tensorflow /xuhong/avro2-tf-a-data-processing-engine-for-tensorflow-141821859 wny8yzhytukctopptuma-signature-96177163875aaf7b21ea3d865a97c74ee6041246eb2e3d7350223872387b8d79-poli-190423195807
To effectively support deep learning at LinkedIn, we need to first address the data processing issues. Most of the datasets used by our ML algorithms (e.g., LinkedIn large scale personalization engine Photon-ML) are in Avro format. Each record in an Avro dataset is essentially a sparse vector, and can be easily consumed by most of the modern classifiers. However, the format cannot be directly used by TensorFlow -- the leading deep learning package. The main blocker is that the sparse vector is not in the same format as Tensor. Many companies have vast amount of ML data in similar sparse vector format, and Tensor format is still relatively new to many companies. Avro2TF bridges this gap by providing scalable Spark based transformation and extension mechanism to efficiently convert the data into TF records that can be readily consumed by TensorFlow. With this technology, engineers can improve their productivity by focusing on model building rather than data processing. In this talk, we will go over the data processing issues common to many machine learning pipelines, and how we solve the problems, then deep dive into the open sourced tool, Avro2TF. How it works, its tech architecture and usage.]]>

To effectively support deep learning at LinkedIn, we need to first address the data processing issues. Most of the datasets used by our ML algorithms (e.g., LinkedIn large scale personalization engine Photon-ML) are in Avro format. Each record in an Avro dataset is essentially a sparse vector, and can be easily consumed by most of the modern classifiers. However, the format cannot be directly used by TensorFlow -- the leading deep learning package. The main blocker is that the sparse vector is not in the same format as Tensor. Many companies have vast amount of ML data in similar sparse vector format, and Tensor format is still relatively new to many companies. Avro2TF bridges this gap by providing scalable Spark based transformation and extension mechanism to efficiently convert the data into TF records that can be readily consumed by TensorFlow. With this technology, engineers can improve their productivity by focusing on model building rather than data processing. In this talk, we will go over the data processing issues common to many machine learning pipelines, and how we solve the problems, then deep dive into the open sourced tool, Avro2TF. How it works, its tech architecture and usage.]]>
Tue, 23 Apr 2019 19:58:07 GMT /xuhong/avro2-tf-a-data-processing-engine-for-tensorflow-141821859 xuhong@slideshare.net(xuhong) Avro2 tf: a data processing engine for tensorflow xuhong To effectively support deep learning at LinkedIn, we need to first address the data processing issues. Most of the datasets used by our ML algorithms (e.g., LinkedIn large scale personalization engine Photon-ML) are in Avro format. Each record in an Avro dataset is essentially a sparse vector, and can be easily consumed by most of the modern classifiers. However, the format cannot be directly used by TensorFlow -- the leading deep learning package. The main blocker is that the sparse vector is not in the same format as Tensor. Many companies have vast amount of ML data in similar sparse vector format, and Tensor format is still relatively new to many companies. Avro2TF bridges this gap by providing scalable Spark based transformation and extension mechanism to efficiently convert the data into TF records that can be readily consumed by TensorFlow. With this technology, engineers can improve their productivity by focusing on model building rather than data processing. In this talk, we will go over the data processing issues common to many machine learning pipelines, and how we solve the problems, then deep dive into the open sourced tool, Avro2TF. How it works, its tech architecture and usage. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/wny8yzhytukctopptuma-signature-96177163875aaf7b21ea3d865a97c74ee6041246eb2e3d7350223872387b8d79-poli-190423195807-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> To effectively support deep learning at LinkedIn, we need to first address the data processing issues. Most of the datasets used by our ML algorithms (e.g., LinkedIn large scale personalization engine Photon-ML) are in Avro format. Each record in an Avro dataset is essentially a sparse vector, and can be easily consumed by most of the modern classifiers. However, the format cannot be directly used by TensorFlow -- the leading deep learning package. The main blocker is that the sparse vector is not in the same format as Tensor. Many companies have vast amount of ML data in similar sparse vector format, and Tensor format is still relatively new to many companies. Avro2TF bridges this gap by providing scalable Spark based transformation and extension mechanism to efficiently convert the data into TF records that can be readily consumed by TensorFlow. With this technology, engineers can improve their productivity by focusing on model building rather than data processing. In this talk, we will go over the data processing issues common to many machine learning pipelines, and how we solve the problems, then deep dive into the open sourced tool, Avro2TF. How it works, its tech architecture and usage.
Avro2 tf: a data processing engine for tensorflow from Xuhong Zhang
]]>
59 2 https://cdn.slidesharecdn.com/ss_thumbnails/wny8yzhytukctopptuma-signature-96177163875aaf7b21ea3d865a97c74ee6041246eb2e3d7350223872387b8d79-poli-190423195807-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Hbase hive pig /slideshow/hbase-hivepig-47517156/47517156 3ixqk589roovq6xiirnv-signature-b31e7973db1137435cf6532cda4d9f57d21b5864a42d71785c1eed4891714525-poli-150428100042-conversion-gate01
Hbase hive pig]]>

Hbase hive pig]]>
Tue, 28 Apr 2015 10:00:42 GMT /slideshow/hbase-hivepig-47517156/47517156 xuhong@slideshare.net(xuhong) Hbase hive pig xuhong Hbase hive pig <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/3ixqk589roovq6xiirnv-signature-b31e7973db1137435cf6532cda4d9f57d21b5864a42d71785c1eed4891714525-poli-150428100042-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Hbase hive pig
Hbase hive pig from Xuhong Zhang
]]>
728 1 https://cdn.slidesharecdn.com/ss_thumbnails/3ixqk589roovq6xiirnv-signature-b31e7973db1137435cf6532cda4d9f57d21b5864a42d71785c1eed4891714525-poli-150428100042-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Install hadoop in a cluster /slideshow/install-hadoop-in-a-cluster/46735694 3xe5vitqrzcqc2aulyix-signature-cfd37bb73f2253f58a1d7a35ecc608a2c2c773db49459f04461070b713eb3427-poli-150407123837-conversion-gate01
]]>

]]>
Tue, 07 Apr 2015 12:38:37 GMT /slideshow/install-hadoop-in-a-cluster/46735694 xuhong@slideshare.net(xuhong) Install hadoop in a cluster xuhong <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/3xe5vitqrzcqc2aulyix-signature-cfd37bb73f2253f58a1d7a35ecc608a2c2c773db49459f04461070b713eb3427-poli-150407123837-conversion-gate01-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br>
Install hadoop in a cluster from Xuhong Zhang
]]>
409 2 https://cdn.slidesharecdn.com/ss_thumbnails/3xe5vitqrzcqc2aulyix-signature-cfd37bb73f2253f58a1d7a35ecc608a2c2c773db49459f04461070b713eb3427-poli-150407123837-conversion-gate01-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-xuhong-48x48.jpg?cb=1589227117 https://cdn.slidesharecdn.com/ss_thumbnails/wny8yzhytukctopptuma-signature-96177163875aaf7b21ea3d865a97c74ee6041246eb2e3d7350223872387b8d79-poli-190423195807-thumbnail.jpg?width=320&height=320&fit=bounds xuhong/avro2-tf-a-data-processing-engine-for-tensorflow-141821859 Avro2 tf: a data proce... https://cdn.slidesharecdn.com/ss_thumbnails/3ixqk589roovq6xiirnv-signature-b31e7973db1137435cf6532cda4d9f57d21b5864a42d71785c1eed4891714525-poli-150428100042-conversion-gate01-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/hbase-hivepig-47517156/47517156 Hbase hive pig https://cdn.slidesharecdn.com/ss_thumbnails/3xe5vitqrzcqc2aulyix-signature-cfd37bb73f2253f58a1d7a35ecc608a2c2c773db49459f04461070b713eb3427-poli-150407123837-conversion-gate01-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/install-hadoop-in-a-cluster/46735694 Install hadoop in a cl...