my experience in spark tunning. All tests are made in production environment(600+ node hadoop cluster). The tunning result is useful for Spark SQL use case.
MongoDB Background and specifics ,
also I provide how to use Mongod Security .
and Basic MongoDB operation by pymongo
我這份文件有介紹MONGODB的特性及限制,Sharding 及 Replicate 的觀悠,Security怎麼作,怎麼用
Build 1 trillion warehouse based on carbon databoxu42
?
Apache CarbonData & Spark Meetup
Build 1 trillion warehouse based on CarbonData
Huawei
Apache Spark? is a unified analytics engine for large-scale data processing.
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario it supports queries on single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
my experience in spark tunning. All tests are made in production environment(600+ node hadoop cluster). The tunning result is useful for Spark SQL use case.
MongoDB Background and specifics ,
also I provide how to use Mongod Security .
and Basic MongoDB operation by pymongo
我這份文件有介紹MONGODB的特性及限制,Sharding 及 Replicate 的觀悠,Security怎麼作,怎麼用
Build 1 trillion warehouse based on carbon databoxu42
?
Apache CarbonData & Spark Meetup
Build 1 trillion warehouse based on CarbonData
Huawei
Apache Spark? is a unified analytics engine for large-scale data processing.
CarbonData is a high-performance data solution that supports various data analytic scenarios, including BI analysis, ad-hoc SQL query, fast filter lookup on detail record, streaming analytics, and so on. CarbonData has been deployed in many enterprise production environments, in one of the largest scenario it supports queries on single table with 3PB data (more than 5 trillion records) with response time less than 3 seconds!
The Construction and Practice of Apache Pegasus in Offline and Online Scenari...acelyc1112009
?
A presentation in Apache Pegasus meetup in 2022 from Wei Wang.
Apache Pegasus is a horizontally scalable, strongly consistent and high-performance key-value store.
Know more about Pegasus https://pegasus.apache.org, https://github.com/apache/incubator-pegasus
This document summarizes an LAMP人主题分享交流会 about new generation behavior targeting advertising technology challenges and optimizations. It will be held on the 12th period and include a brand interaction special session. Contact information is provided for the LAMPER website, QQ group, Weibo account.
1. Flume is a distributed system for collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.
2. Flume supports reliable transport of log data from diverse sources to centralized data stores like HDFS. It can import data from social networks, web servers, and other applications.
3. The document discusses key Flume concepts like sources, sinks, and agents and how they can be configured using decorators to batch, compress, and checksum events for reliable data transport.
This document contains information about iPinyou, including that it was founded in 2008 and has offices in China. It also references various big data technologies like Hadoop, Redis, HBase, Hive and Pig that iPinyou utilizes. Contact information is provided at the end for the human resources department.
- This document discusses techniques for building a high performance and scalable web architecture including load balancing, caching, databases, and more.
- It recommends using technologies like LVS, HAProxy, Nginx for load balancing, Memcached/Redis for caching, and MySQL, MongoDB for databases.
- It also provides examples of system architectures with load balancers, web servers, databases, and caches arranged in clusters with options for failover and redundancy.
The document discusses various techniques for loading scripts and advertisements asynchronously and lazily, including using document.write, modifying elements by ID, iframes, and loading scripts on iframe onload events. It also lists some big data technologies like Hadoop, Redis, and Hive and provides contact information for a Chinese company.
1. Flume is a distributed system for collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows.
2. Flume supports reliable transport of log data from diverse sources to destinations like HDFS, HBase, Elasticsearch etc. It can import data from sources like web servers, social networks and distribute it to various sinks.
3. Flume provides extensibility through its source, channel and sink architecture along with built-in sources, channels and sinks. It also supports batching, compression and other decorators to optimize data transport.
- PapayaMobile is a leading mobile social gaming network with 23 million users and 300+ games. It was founded in 2008 and has raised $22 million.
- PapayaMobile provides tools and services for third party developers to build social games, including social SDKs and a game engine.
- PapayaMobile helps promote new games through its social network of over 23 million users, generating initial downloads and viral growth for games.
11. MyFOX—数据查询
SELECT IF(INSTR(f.keyword,' ') >
0, UPPER(TRIM(f.keyword)), CONCAT(b.brand_name,'
',UPPER(TRIM(f.keyword)))) AS f0,
SUM(f.search_num) AS f1,
SUM(f.uv) AS f2,
ROUND(SUM(f.search_num) / SUM(f.uv), 2) AS f3,
AVG(f.uv) AS f4
FROM f
INNER JOIN dim_brand b ON f.keyword_brand_id = b.brand_id
WHERE f.keyword_type_id = 1 AND f.keyword != ''
AND keyword_cat_id IN ('50002535')
AND thedate <= '2011-03-10'
AND thedate >= '2011-03-08'
GROUP BY f0
ORDER BY SUM(f.search_num) DESC LIMIT 0, 1500
13. MyFOX路由层—语义理解
WHERE thedate <= '2011-03-10'
AND thedate > '2011-03-07'
AND toprank_id IN (2, 3)
2 3
{"toprank_id":"2", {"toprank_id":“3",
2011-03-08
"thedate":"2011-03-08"} "thedate":"2011-03-08"}
{"toprank_id":"2", {"toprank_id":“3",
2011-03-09
"thedate":"2011-03-09"} "thedate":"2011-03-09"}
{"toprank_id":"2", {"toprank_id":“3",
2011-03-10
"thedate":"2011-03-10"} "thedate":"2011-03-10"}
14. MyFOX路由层—字段改写
SELECT a AS f0,
SUM(f.search_num) AS f1,
SUM(f.uv) AS f2,
ROUND(SUM(f.search_num) / SUM(f.uv), 2) AS
f3,
AVG(f.uv) AS f4
? AVG(a)
? 1 + SUM(a)
? SELECT a FROM … ORDER BY b
? 重复查询列