ºÝºÝߣshows by User: lallea / http://www.slideshare.net/images/logo.gif ºÝºÝߣshows by User: lallea / Tue, 17 Jun 2025 17:19:34 GMT ºÝºÝߣShare feed for ºÝºÝߣshows by User: lallea All the DataOps, all the paradigms . /slideshow/all-the-dataops-all-the-paradigms/280671667 allthedataopsalltheparadigms-250617171934-55b90e8e
Data warehouses, lakes, lakehouses, streams, fabrics, hubs, vaults, and meshes. We sometimes choose deliberately, sometimes influenced by trends, yet often get an organic blend. But the choices have orders of magnitude in impact on operations cost and iteration speed. Let's dissect the paradigms and their operational aspects once and for all.]]>

Data warehouses, lakes, lakehouses, streams, fabrics, hubs, vaults, and meshes. We sometimes choose deliberately, sometimes influenced by trends, yet often get an organic blend. But the choices have orders of magnitude in impact on operations cost and iteration speed. Let's dissect the paradigms and their operational aspects once and for all.]]>
Tue, 17 Jun 2025 17:19:34 GMT /slideshow/all-the-dataops-all-the-paradigms/280671667 lallea@slideshare.net(lallea) All the DataOps, all the paradigms . lallea Data warehouses, lakes, lakehouses, streams, fabrics, hubs, vaults, and meshes. We sometimes choose deliberately, sometimes influenced by trends, yet often get an organic blend. But the choices have orders of magnitude in impact on operations cost and iteration speed. Let's dissect the paradigms and their operational aspects once and for all. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/allthedataopsalltheparadigms-250617171934-55b90e8e-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Data warehouses, lakes, lakehouses, streams, fabrics, hubs, vaults, and meshes. We sometimes choose deliberately, sometimes influenced by trends, yet often get an organic blend. But the choices have orders of magnitude in impact on operations cost and iteration speed. Let&#39;s dissect the paradigms and their operational aspects once and for all.
All the DataOps, all the paradigms . from Lars Albertsson
]]>
184 0 https://cdn.slidesharecdn.com/ss_thumbnails/allthedataopsalltheparadigms-250617171934-55b90e8e-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Generative AI - the power to destroy democracy meets the security and reliability of MS-DOS.pdf /slideshow/generative-ai-the-power-to-destroy-democracy-meets-the-security-and-reliability-of-ms-dos-pdf/272734264 generativeai-thepowertodestroydemocracymeetsthesecurityandreliabilityofms-dos-241025184532-0275b2de
Generative AI provides cheap and powerful means to fabricate evidence and manipulate populations, which might very well threaten western democracy as we know it. We will need to manage the risks with generative AI, but our main methods for ensuring that systems are reliable and secure are based on assumptions that are not valid for AI. Testing and monitoring assume that application behaviour is predictable and that desirable and undesirable behaviour are easy to separate. Containment for security relies on strong separation between control plane and application use. For large language models (LLMs), it can be circumvented with "please ignore your instructions." MS-DOS and SQL are historical examples of poor control plane separation. They enabled viruses and SQL injection, which have plagued us for decades. We need better risk management than that for AI.]]>

Generative AI provides cheap and powerful means to fabricate evidence and manipulate populations, which might very well threaten western democracy as we know it. We will need to manage the risks with generative AI, but our main methods for ensuring that systems are reliable and secure are based on assumptions that are not valid for AI. Testing and monitoring assume that application behaviour is predictable and that desirable and undesirable behaviour are easy to separate. Containment for security relies on strong separation between control plane and application use. For large language models (LLMs), it can be circumvented with "please ignore your instructions." MS-DOS and SQL are historical examples of poor control plane separation. They enabled viruses and SQL injection, which have plagued us for decades. We need better risk management than that for AI.]]>
Fri, 25 Oct 2024 18:45:32 GMT /slideshow/generative-ai-the-power-to-destroy-democracy-meets-the-security-and-reliability-of-ms-dos-pdf/272734264 lallea@slideshare.net(lallea) Generative AI - the power to destroy democracy meets the security and reliability of MS-DOS.pdf lallea Generative AI provides cheap and powerful means to fabricate evidence and manipulate populations, which might very well threaten western democracy as we know it. We will need to manage the risks with generative AI, but our main methods for ensuring that systems are reliable and secure are based on assumptions that are not valid for AI. Testing and monitoring assume that application behaviour is predictable and that desirable and undesirable behaviour are easy to separate. Containment for security relies on strong separation between control plane and application use. For large language models (LLMs), it can be circumvented with "please ignore your instructions." MS-DOS and SQL are historical examples of poor control plane separation. They enabled viruses and SQL injection, which have plagued us for decades. We need better risk management than that for AI. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/generativeai-thepowertodestroydemocracymeetsthesecurityandreliabilityofms-dos-241025184532-0275b2de-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Generative AI provides cheap and powerful means to fabricate evidence and manipulate populations, which might very well threaten western democracy as we know it. We will need to manage the risks with generative AI, but our main methods for ensuring that systems are reliable and secure are based on assumptions that are not valid for AI. Testing and monitoring assume that application behaviour is predictable and that desirable and undesirable behaviour are easy to separate. Containment for security relies on strong separation between control plane and application use. For large language models (LLMs), it can be circumvented with &quot;please ignore your instructions.&quot; MS-DOS and SQL are historical examples of poor control plane separation. They enabled viruses and SQL injection, which have plagued us for decades. We need better risk management than that for AI.
Generative AI - the power to destroy democracy meets the security and reliability of MS-DOS.pdf from Lars Albertsson
]]>
205 0 https://cdn.slidesharecdn.com/ss_thumbnails/generativeai-thepowertodestroydemocracymeetsthesecurityandreliabilityofms-dos-241025184532-0275b2de-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
The road to pragmatic application of AI.pdf /slideshow/the-road-to-pragmatic-application-of-ai-pdf/271291004 theroadtopragmaticapplicationofai-240825114806-c2071cb2
AI, and generative AI in particular, is all the rage now. Everyone is experimenting, and many ambitious projects are underway. However, few projects have created significant business value. While most struggle to translate modern technology into revenue, "born digital" companies are launching AI-powered features with ease. We can find quantitative metrics related to innovation and productivity that differ by 100-1000 times, e.g. lead time from AI innovation idea to launch or operational costs of data flows. We call the difference in innovation and productivity between these companies and the incumbents the "data divide" or the "AI divide. In this session Lars will explain what the leading companies do differently and why the AI divide persists. The born digital companies have succeeded enabling innovation to grow organically. It is a path that is easy to walk, gives quick return on investment, but is nevertheless rarely chosen. You’ll learn how to go down that path, and how to achieve conditions for success.]]>

AI, and generative AI in particular, is all the rage now. Everyone is experimenting, and many ambitious projects are underway. However, few projects have created significant business value. While most struggle to translate modern technology into revenue, "born digital" companies are launching AI-powered features with ease. We can find quantitative metrics related to innovation and productivity that differ by 100-1000 times, e.g. lead time from AI innovation idea to launch or operational costs of data flows. We call the difference in innovation and productivity between these companies and the incumbents the "data divide" or the "AI divide. In this session Lars will explain what the leading companies do differently and why the AI divide persists. The born digital companies have succeeded enabling innovation to grow organically. It is a path that is easy to walk, gives quick return on investment, but is nevertheless rarely chosen. You’ll learn how to go down that path, and how to achieve conditions for success.]]>
Sun, 25 Aug 2024 11:48:06 GMT /slideshow/the-road-to-pragmatic-application-of-ai-pdf/271291004 lallea@slideshare.net(lallea) The road to pragmatic application of AI.pdf lallea AI, and generative AI in particular, is all the rage now. Everyone is experimenting, and many ambitious projects are underway. However, few projects have created significant business value. While most struggle to translate modern technology into revenue, "born digital" companies are launching AI-powered features with ease. We can find quantitative metrics related to innovation and productivity that differ by 100-1000 times, e.g. lead time from AI innovation idea to launch or operational costs of data flows. We call the difference in innovation and productivity between these companies and the incumbents the "data divide" or the "AI divide. In this session Lars will explain what the leading companies do differently and why the AI divide persists. The born digital companies have succeeded enabling innovation to grow organically. It is a path that is easy to walk, gives quick return on investment, but is nevertheless rarely chosen. You’ll learn how to go down that path, and how to achieve conditions for success. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/theroadtopragmaticapplicationofai-240825114806-c2071cb2-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> AI, and generative AI in particular, is all the rage now. Everyone is experimenting, and many ambitious projects are underway. However, few projects have created significant business value. While most struggle to translate modern technology into revenue, &quot;born digital&quot; companies are launching AI-powered features with ease. We can find quantitative metrics related to innovation and productivity that differ by 100-1000 times, e.g. lead time from AI innovation idea to launch or operational costs of data flows. We call the difference in innovation and productivity between these companies and the incumbents the &quot;data divide&quot; or the &quot;AI divide. In this session Lars will explain what the leading companies do differently and why the AI divide persists. The born digital companies have succeeded enabling innovation to grow organically. It is a path that is easy to walk, gives quick return on investment, but is nevertheless rarely chosen. You’ll learn how to go down that path, and how to achieve conditions for success.
The road to pragmatic application of AI.pdf from Lars Albertsson
]]>
220 0 https://cdn.slidesharecdn.com/ss_thumbnails/theroadtopragmaticapplicationofai-240825114806-c2071cb2-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
End-to-end pipeline agility - Berlin Buzzwords 2024 /slideshow/end-to-end-pipeline-agility-berlin-buzzwords-2024/269626383 end-to-endpipelineagility-240611150223-5b254fd4
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines. A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more. A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream. Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.]]>

We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines. A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more. A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream. Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.]]>
Tue, 11 Jun 2024 15:02:23 GMT /slideshow/end-to-end-pipeline-agility-berlin-buzzwords-2024/269626383 lallea@slideshare.net(lallea) End-to-end pipeline agility - Berlin Buzzwords 2024 lallea We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines. A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more. A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream. Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/end-to-endpipelineagility-240611150223-5b254fd4-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines. A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question &quot;How long time does it take for all downstream pipelines to be adapted to an upstream change,&quot; the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more. A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream. Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
End-to-end pipeline agility - Berlin Buzzwords 2024 from Lars Albertsson
]]>
181 0 https://cdn.slidesharecdn.com/ss_thumbnails/end-to-endpipelineagility-240611150223-5b254fd4-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Schema on read is obsolete. Welcome metaprogramming..pdf /slideshow/schema-on-read-is-obsolete-welcome-metaprogrammingpdf/267584136 schemaonreadisobsolete-240427092512-fdb43cf3
How fast can you modify your data collection to include a new field, make all the necessary changes in data processing and storage, and then use that field in analytics or product features? For many companies, the answer is a few quarters, whereas others do it in a day. This data agility latency has a direct impact on companies' ability to innovate with data. Schema-on-read has been a key strategy to lower that latency - as the community has shifted towards storing data outside relational databases, we no longer need to make series of schema changes through the whole data chain, coordinated between teams to minimise operational risk. Schema-on-read comes with a cost, however. Errors that we used to catch during testing or in early test deployments can now sneak into production undetected and surface as product errors or hard-to-debug data quality problems later than with schema-on-write solutions. In this presentation, we will show how we have rejected the tradeoff between slow schema change rate and quality to achieve the best of both worlds. By using metaprogramming and versioned pipelines that are tested end-to-end, we can achieve fast schema changes with schema-on-write and the protection of static typing. We will describe the tools in our toolbox - Scalameta, Chimney, Bazel, and custom tools. We will also show how we leverage them to take static typing one step further and differentiate between domain types that share representation, e.g. EmailAddress vs ValidatedEmailAddress or kW vs kWh, while maintaining harmony with data technology ecosystems. ]]>

How fast can you modify your data collection to include a new field, make all the necessary changes in data processing and storage, and then use that field in analytics or product features? For many companies, the answer is a few quarters, whereas others do it in a day. This data agility latency has a direct impact on companies' ability to innovate with data. Schema-on-read has been a key strategy to lower that latency - as the community has shifted towards storing data outside relational databases, we no longer need to make series of schema changes through the whole data chain, coordinated between teams to minimise operational risk. Schema-on-read comes with a cost, however. Errors that we used to catch during testing or in early test deployments can now sneak into production undetected and surface as product errors or hard-to-debug data quality problems later than with schema-on-write solutions. In this presentation, we will show how we have rejected the tradeoff between slow schema change rate and quality to achieve the best of both worlds. By using metaprogramming and versioned pipelines that are tested end-to-end, we can achieve fast schema changes with schema-on-write and the protection of static typing. We will describe the tools in our toolbox - Scalameta, Chimney, Bazel, and custom tools. We will also show how we leverage them to take static typing one step further and differentiate between domain types that share representation, e.g. EmailAddress vs ValidatedEmailAddress or kW vs kWh, while maintaining harmony with data technology ecosystems. ]]>
Sat, 27 Apr 2024 09:25:12 GMT /slideshow/schema-on-read-is-obsolete-welcome-metaprogrammingpdf/267584136 lallea@slideshare.net(lallea) Schema on read is obsolete. Welcome metaprogramming..pdf lallea How fast can you modify your data collection to include a new field, make all the necessary changes in data processing and storage, and then use that field in analytics or product features? For many companies, the answer is a few quarters, whereas others do it in a day. This data agility latency has a direct impact on companies' ability to innovate with data. Schema-on-read has been a key strategy to lower that latency - as the community has shifted towards storing data outside relational databases, we no longer need to make series of schema changes through the whole data chain, coordinated between teams to minimise operational risk. Schema-on-read comes with a cost, however. Errors that we used to catch during testing or in early test deployments can now sneak into production undetected and surface as product errors or hard-to-debug data quality problems later than with schema-on-write solutions. In this presentation, we will show how we have rejected the tradeoff between slow schema change rate and quality to achieve the best of both worlds. By using metaprogramming and versioned pipelines that are tested end-to-end, we can achieve fast schema changes with schema-on-write and the protection of static typing. We will describe the tools in our toolbox - Scalameta, Chimney, Bazel, and custom tools. We will also show how we leverage them to take static typing one step further and differentiate between domain types that share representation, e.g. EmailAddress vs ValidatedEmailAddress or kW vs kWh, while maintaining harmony with data technology ecosystems. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/schemaonreadisobsolete-240427092512-fdb43cf3-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> How fast can you modify your data collection to include a new field, make all the necessary changes in data processing and storage, and then use that field in analytics or product features? For many companies, the answer is a few quarters, whereas others do it in a day. This data agility latency has a direct impact on companies&#39; ability to innovate with data. Schema-on-read has been a key strategy to lower that latency - as the community has shifted towards storing data outside relational databases, we no longer need to make series of schema changes through the whole data chain, coordinated between teams to minimise operational risk. Schema-on-read comes with a cost, however. Errors that we used to catch during testing or in early test deployments can now sneak into production undetected and surface as product errors or hard-to-debug data quality problems later than with schema-on-write solutions. In this presentation, we will show how we have rejected the tradeoff between slow schema change rate and quality to achieve the best of both worlds. By using metaprogramming and versioned pipelines that are tested end-to-end, we can achieve fast schema changes with schema-on-write and the protection of static typing. We will describe the tools in our toolbox - Scalameta, Chimney, Bazel, and custom tools. We will also show how we leverage them to take static typing one step further and differentiate between domain types that share representation, e.g. EmailAddress vs ValidatedEmailAddress or kW vs kWh, while maintaining harmony with data technology ecosystems.
Schema on read is obsolete. Welcome metaprogramming..pdf from Lars Albertsson
]]>
241 0 https://cdn.slidesharecdn.com/ss_thumbnails/schemaonreadisobsolete-240427092512-fdb43cf3-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Industrialised data - the key to AI success.pdf /slideshow/industrialised-data-the-key-to-ai-successpdf/267534646 industrialiseddata-thekeytoaisuccess-240425141547-ddeb7bac
The DORA research concluded that there are orders of magnitude difference in delivery KPIs between leaders and the incumbents. In this presentation, we will describe the corresponding "data divide" in capabilities in data engineering, and how the leading companies have adopted an industrial approach to data management, enabling them to leap so far ahead. We will explain why "data industrialisation" is a key factor for succeeding going from AI prototypes to sustainable value from AI in production. We will also describe a path for companies outside the technology elite to cross the data divide into the industrialised data realm and share some very honest learnings from helping companies go that path.]]>

The DORA research concluded that there are orders of magnitude difference in delivery KPIs between leaders and the incumbents. In this presentation, we will describe the corresponding "data divide" in capabilities in data engineering, and how the leading companies have adopted an industrial approach to data management, enabling them to leap so far ahead. We will explain why "data industrialisation" is a key factor for succeeding going from AI prototypes to sustainable value from AI in production. We will also describe a path for companies outside the technology elite to cross the data divide into the industrialised data realm and share some very honest learnings from helping companies go that path.]]>
Thu, 25 Apr 2024 14:15:47 GMT /slideshow/industrialised-data-the-key-to-ai-successpdf/267534646 lallea@slideshare.net(lallea) Industrialised data - the key to AI success.pdf lallea The DORA research concluded that there are orders of magnitude difference in delivery KPIs between leaders and the incumbents. In this presentation, we will describe the corresponding "data divide" in capabilities in data engineering, and how the leading companies have adopted an industrial approach to data management, enabling them to leap so far ahead. We will explain why "data industrialisation" is a key factor for succeeding going from AI prototypes to sustainable value from AI in production. We will also describe a path for companies outside the technology elite to cross the data divide into the industrialised data realm and share some very honest learnings from helping companies go that path. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/industrialiseddata-thekeytoaisuccess-240425141547-ddeb7bac-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The DORA research concluded that there are orders of magnitude difference in delivery KPIs between leaders and the incumbents. In this presentation, we will describe the corresponding &quot;data divide&quot; in capabilities in data engineering, and how the leading companies have adopted an industrial approach to data management, enabling them to leap so far ahead. We will explain why &quot;data industrialisation&quot; is a key factor for succeeding going from AI prototypes to sustainable value from AI in production. We will also describe a path for companies outside the technology elite to cross the data divide into the industrialised data realm and share some very honest learnings from helping companies go that path.
Industrialised data - the key to AI success.pdf from Lars Albertsson
]]>
244 0 https://cdn.slidesharecdn.com/ss_thumbnails/industrialiseddata-thekeytoaisuccess-240425141547-ddeb7bac-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Crossing the data divide /slideshow/crossing-the-data-divide/263151376 crossingthedatadivide-231107095439-0f626da0
Did you know that the tech elite does not work at all like you do? Most people don't, and don't want to know. The State of DevOps report concluded a span of 1000x in delivery time and reliability between the elite and low performers. There is a similar gap for delivery time of data or ML pipelines to production. The gap in ability to compute datasets is higher, somewhere around a million times. We call this the data divide or the AI divide. It is widening over time, since most companies are not aware of its width. We will share the principles we applied in the most successful Scandinavian crossing of the data divide. We never explicitly shared or described, nor fully understood the principles at the time, but it is long due to explicitly enumerate them. The presentation will likely be uncomfortable and surprising, because it does not match what you do and what your vendors say. You will have no practical use of the information, since you cannot apply the principles, because they contradict many contemporary trends and popular technologies on the market, and you would be unable to overcome the forces of trends, popularity, and messages from vendors. They worked beautifully for us at the time. ]]>

Did you know that the tech elite does not work at all like you do? Most people don't, and don't want to know. The State of DevOps report concluded a span of 1000x in delivery time and reliability between the elite and low performers. There is a similar gap for delivery time of data or ML pipelines to production. The gap in ability to compute datasets is higher, somewhere around a million times. We call this the data divide or the AI divide. It is widening over time, since most companies are not aware of its width. We will share the principles we applied in the most successful Scandinavian crossing of the data divide. We never explicitly shared or described, nor fully understood the principles at the time, but it is long due to explicitly enumerate them. The presentation will likely be uncomfortable and surprising, because it does not match what you do and what your vendors say. You will have no practical use of the information, since you cannot apply the principles, because they contradict many contemporary trends and popular technologies on the market, and you would be unable to overcome the forces of trends, popularity, and messages from vendors. They worked beautifully for us at the time. ]]>
Tue, 07 Nov 2023 09:54:38 GMT /slideshow/crossing-the-data-divide/263151376 lallea@slideshare.net(lallea) Crossing the data divide lallea Did you know that the tech elite does not work at all like you do? Most people don't, and don't want to know. The State of DevOps report concluded a span of 1000x in delivery time and reliability between the elite and low performers. There is a similar gap for delivery time of data or ML pipelines to production. The gap in ability to compute datasets is higher, somewhere around a million times. We call this the data divide or the AI divide. It is widening over time, since most companies are not aware of its width. We will share the principles we applied in the most successful Scandinavian crossing of the data divide. We never explicitly shared or described, nor fully understood the principles at the time, but it is long due to explicitly enumerate them. The presentation will likely be uncomfortable and surprising, because it does not match what you do and what your vendors say. You will have no practical use of the information, since you cannot apply the principles, because they contradict many contemporary trends and popular technologies on the market, and you would be unable to overcome the forces of trends, popularity, and messages from vendors. They worked beautifully for us at the time. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/crossingthedatadivide-231107095439-0f626da0-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Did you know that the tech elite does not work at all like you do? Most people don&#39;t, and don&#39;t want to know. The State of DevOps report concluded a span of 1000x in delivery time and reliability between the elite and low performers. There is a similar gap for delivery time of data or ML pipelines to production. The gap in ability to compute datasets is higher, somewhere around a million times. We call this the data divide or the AI divide. It is widening over time, since most companies are not aware of its width. We will share the principles we applied in the most successful Scandinavian crossing of the data divide. We never explicitly shared or described, nor fully understood the principles at the time, but it is long due to explicitly enumerate them. The presentation will likely be uncomfortable and surprising, because it does not match what you do and what your vendors say. You will have no practical use of the information, since you cannot apply the principles, because they contradict many contemporary trends and popular technologies on the market, and you would be unable to overcome the forces of trends, popularity, and messages from vendors. They worked beautifully for us at the time.
Crossing the data divide from Lars Albertsson
]]>
60 0 https://cdn.slidesharecdn.com/ss_thumbnails/crossingthedatadivide-231107095439-0f626da0-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Schema management with Scalameta /slideshow/schema-management-with-scalameta/263150447 schemamanagementwithscalameta-231107091043-7c0f0f81
Scalameta is a library for static analysis and processing of Scala source code, which supports syntactic and semantic analysis. In this presentation, we explain how Scalameta works, and how you can use Scalameta for custom code analysis. We demonstrate how we have used scalameta to automate schema management and privacy protection.]]>

Scalameta is a library for static analysis and processing of Scala source code, which supports syntactic and semantic analysis. In this presentation, we explain how Scalameta works, and how you can use Scalameta for custom code analysis. We demonstrate how we have used scalameta to automate schema management and privacy protection.]]>
Tue, 07 Nov 2023 09:10:42 GMT /slideshow/schema-management-with-scalameta/263150447 lallea@slideshare.net(lallea) Schema management with Scalameta lallea Scalameta is a library for static analysis and processing of Scala source code, which supports syntactic and semantic analysis. In this presentation, we explain how Scalameta works, and how you can use Scalameta for custom code analysis. We demonstrate how we have used scalameta to automate schema management and privacy protection. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/schemamanagementwithscalameta-231107091043-7c0f0f81-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Scalameta is a library for static analysis and processing of Scala source code, which supports syntactic and semantic analysis. In this presentation, we explain how Scalameta works, and how you can use Scalameta for custom code analysis. We demonstrate how we have used scalameta to automate schema management and privacy protection.
Schema management with Scalameta from Lars Albertsson
]]>
43 0 https://cdn.slidesharecdn.com/ss_thumbnails/schemamanagementwithscalameta-231107091043-7c0f0f81-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
How to not kill people - Berlin Buzzwords 2023.pdf /slideshow/how-to-not-kill-people-berlin-buzzwords-2023pdf/258599715 howtonotkillpeople-berlinbuzzwords2023-230624130943-c6a64f48
With the rise of artificial intelligence, we give more control of our lives to software. We thereby introduce new risks, and the fatal Uber crash in 2018 is the first example of AI causing an accidental death. It will be up to us as software engineers to build systems safe and reliable enough to entrust with important decisions. Our culture, however, includes praising companies that move fast and break things (Facebook), celebrate principled confrontation (Uber), fake self-driving demonstrations (Tesla), and are right, a lot (Amazon). As an industry, we need to radically improve to meet the challenge, or more people will die. In this presentation, we will look at aviation - the industry most successful at continuously improving safety - and attempt to learn. We will look at aviation safety principles, compare with similar practices in software engineering, and see how we can translate safety principles that have worked well in aviation to the software engineering domain. Video: https://youtu.be/IitY9yZFPSA]]>

With the rise of artificial intelligence, we give more control of our lives to software. We thereby introduce new risks, and the fatal Uber crash in 2018 is the first example of AI causing an accidental death. It will be up to us as software engineers to build systems safe and reliable enough to entrust with important decisions. Our culture, however, includes praising companies that move fast and break things (Facebook), celebrate principled confrontation (Uber), fake self-driving demonstrations (Tesla), and are right, a lot (Amazon). As an industry, we need to radically improve to meet the challenge, or more people will die. In this presentation, we will look at aviation - the industry most successful at continuously improving safety - and attempt to learn. We will look at aviation safety principles, compare with similar practices in software engineering, and see how we can translate safety principles that have worked well in aviation to the software engineering domain. Video: https://youtu.be/IitY9yZFPSA]]>
Sat, 24 Jun 2023 13:09:43 GMT /slideshow/how-to-not-kill-people-berlin-buzzwords-2023pdf/258599715 lallea@slideshare.net(lallea) How to not kill people - Berlin Buzzwords 2023.pdf lallea With the rise of artificial intelligence, we give more control of our lives to software. We thereby introduce new risks, and the fatal Uber crash in 2018 is the first example of AI causing an accidental death. It will be up to us as software engineers to build systems safe and reliable enough to entrust with important decisions. Our culture, however, includes praising companies that move fast and break things (Facebook), celebrate principled confrontation (Uber), fake self-driving demonstrations (Tesla), and are right, a lot (Amazon). As an industry, we need to radically improve to meet the challenge, or more people will die. In this presentation, we will look at aviation - the industry most successful at continuously improving safety - and attempt to learn. We will look at aviation safety principles, compare with similar practices in software engineering, and see how we can translate safety principles that have worked well in aviation to the software engineering domain. Video: https://youtu.be/IitY9yZFPSA <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/howtonotkillpeople-berlinbuzzwords2023-230624130943-c6a64f48-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> With the rise of artificial intelligence, we give more control of our lives to software. We thereby introduce new risks, and the fatal Uber crash in 2018 is the first example of AI causing an accidental death. It will be up to us as software engineers to build systems safe and reliable enough to entrust with important decisions. Our culture, however, includes praising companies that move fast and break things (Facebook), celebrate principled confrontation (Uber), fake self-driving demonstrations (Tesla), and are right, a lot (Amazon). As an industry, we need to radically improve to meet the challenge, or more people will die. In this presentation, we will look at aviation - the industry most successful at continuously improving safety - and attempt to learn. We will look at aviation safety principles, compare with similar practices in software engineering, and see how we can translate safety principles that have worked well in aviation to the software engineering domain. Video: https://youtu.be/IitY9yZFPSA
How to not kill people - Berlin Buzzwords 2023.pdf from Lars Albertsson
]]>
62 0 https://cdn.slidesharecdn.com/ss_thumbnails/howtonotkillpeople-berlinbuzzwords2023-230624130943-c6a64f48-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Data engineering in 10 years.pdf /lallea/data-engineering-in-10-yearspdf dataengineeringin10years-221110105007-41116670
If we could only predict the future of the software industry, we could make better investments and decisions. We could waste less resources on technology and processes we know will not last, or at least be conscious in our decisions to choose solutions with a limited life time. It turns out that for data engineering, we can predict the future, because it has already happened. Not in our workplace, but at a few leading companies that are blazing ahead. It has also already happened in the neighbouring field of software engineering, which is two decades ahead of data engineering regarding process maturity. In this presentation, we will glimpse into the future of data engineering. Data engineering has gone from legacy data warehouses with stored procedures, to big data with Hadoop and data lakes, on to a new form of modern data warehouses and low code tools aka "the modern data stack". Where does it go from here? We will look at the points where data leaders differ from the crowd and combine with observations on how software engineering has evolved, to see that it points towards a new, more industrialised form of data engineering - "data factory engineering".]]>

If we could only predict the future of the software industry, we could make better investments and decisions. We could waste less resources on technology and processes we know will not last, or at least be conscious in our decisions to choose solutions with a limited life time. It turns out that for data engineering, we can predict the future, because it has already happened. Not in our workplace, but at a few leading companies that are blazing ahead. It has also already happened in the neighbouring field of software engineering, which is two decades ahead of data engineering regarding process maturity. In this presentation, we will glimpse into the future of data engineering. Data engineering has gone from legacy data warehouses with stored procedures, to big data with Hadoop and data lakes, on to a new form of modern data warehouses and low code tools aka "the modern data stack". Where does it go from here? We will look at the points where data leaders differ from the crowd and combine with observations on how software engineering has evolved, to see that it points towards a new, more industrialised form of data engineering - "data factory engineering".]]>
Thu, 10 Nov 2022 10:50:07 GMT /lallea/data-engineering-in-10-yearspdf lallea@slideshare.net(lallea) Data engineering in 10 years.pdf lallea If we could only predict the future of the software industry, we could make better investments and decisions. We could waste less resources on technology and processes we know will not last, or at least be conscious in our decisions to choose solutions with a limited life time. It turns out that for data engineering, we can predict the future, because it has already happened. Not in our workplace, but at a few leading companies that are blazing ahead. It has also already happened in the neighbouring field of software engineering, which is two decades ahead of data engineering regarding process maturity. In this presentation, we will glimpse into the future of data engineering. Data engineering has gone from legacy data warehouses with stored procedures, to big data with Hadoop and data lakes, on to a new form of modern data warehouses and low code tools aka "the modern data stack". Where does it go from here? We will look at the points where data leaders differ from the crowd and combine with observations on how software engineering has evolved, to see that it points towards a new, more industrialised form of data engineering - "data factory engineering". <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dataengineeringin10years-221110105007-41116670-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> If we could only predict the future of the software industry, we could make better investments and decisions. We could waste less resources on technology and processes we know will not last, or at least be conscious in our decisions to choose solutions with a limited life time. It turns out that for data engineering, we can predict the future, because it has already happened. Not in our workplace, but at a few leading companies that are blazing ahead. It has also already happened in the neighbouring field of software engineering, which is two decades ahead of data engineering regarding process maturity. In this presentation, we will glimpse into the future of data engineering. Data engineering has gone from legacy data warehouses with stored procedures, to big data with Hadoop and data lakes, on to a new form of modern data warehouses and low code tools aka &quot;the modern data stack&quot;. Where does it go from here? We will look at the points where data leaders differ from the crowd and combine with observations on how software engineering has evolved, to see that it points towards a new, more industrialised form of data engineering - &quot;data factory engineering&quot;.
Data engineering in 10 years.pdf from Lars Albertsson
]]>
952 0 https://cdn.slidesharecdn.com/ss_thumbnails/dataengineeringin10years-221110105007-41116670-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
The 7 habits of data effective companies.pdf /slideshow/the-7-habits-of-data-effective-companiespdf/254098824 the7habitsofdataeffectivecompanies-221109222221-82a9b69c
Are there 10x engineers? Unclear. But there are 1000x companies for sure. If we look at value delivery metrics, we can find spans of 1000x between leaders and companies lagging behind. The DORA research effort showed a 1000x span in availability metrics (see the State of DevOps report). Lego once built a virtual reality game, "Lego Universe", comparable to Minecraft. They spent 10000x the cost before the game launch, compared to Mojang. We see similar huge efficiency spans in ability to extract value from data. Most data processing involves creation of datasets - refined data artifacts of business value, e.g. reports, recommendation indexes, or machine learning models. Mature companies with large traditional data warehouses typically create on the order of 100s of datasets per day. Google produces billions of datasets per day (Goods paper, 2016). Spotify shows more modest figures, only hundreds of thousands (deduced from conference presentations 2018), but still 1000x more than a typical enterprise. How do they achieve this level of data efficiency? We have spent time with companies at both ends of the spectrum and seen patterns emerge. In this presentation, we will highlight the differences with seven patterns that seem to be crucial for data efficiency. We have also figured out how to achieve a comparable level of efficiency at a small scale, without huge investments, and will share our tricks of the trade.]]>

Are there 10x engineers? Unclear. But there are 1000x companies for sure. If we look at value delivery metrics, we can find spans of 1000x between leaders and companies lagging behind. The DORA research effort showed a 1000x span in availability metrics (see the State of DevOps report). Lego once built a virtual reality game, "Lego Universe", comparable to Minecraft. They spent 10000x the cost before the game launch, compared to Mojang. We see similar huge efficiency spans in ability to extract value from data. Most data processing involves creation of datasets - refined data artifacts of business value, e.g. reports, recommendation indexes, or machine learning models. Mature companies with large traditional data warehouses typically create on the order of 100s of datasets per day. Google produces billions of datasets per day (Goods paper, 2016). Spotify shows more modest figures, only hundreds of thousands (deduced from conference presentations 2018), but still 1000x more than a typical enterprise. How do they achieve this level of data efficiency? We have spent time with companies at both ends of the spectrum and seen patterns emerge. In this presentation, we will highlight the differences with seven patterns that seem to be crucial for data efficiency. We have also figured out how to achieve a comparable level of efficiency at a small scale, without huge investments, and will share our tricks of the trade.]]>
Wed, 09 Nov 2022 22:22:21 GMT /slideshow/the-7-habits-of-data-effective-companiespdf/254098824 lallea@slideshare.net(lallea) The 7 habits of data effective companies.pdf lallea Are there 10x engineers? Unclear. But there are 1000x companies for sure. If we look at value delivery metrics, we can find spans of 1000x between leaders and companies lagging behind. The DORA research effort showed a 1000x span in availability metrics (see the State of DevOps report). Lego once built a virtual reality game, "Lego Universe", comparable to Minecraft. They spent 10000x the cost before the game launch, compared to Mojang. We see similar huge efficiency spans in ability to extract value from data. Most data processing involves creation of datasets - refined data artifacts of business value, e.g. reports, recommendation indexes, or machine learning models. Mature companies with large traditional data warehouses typically create on the order of 100s of datasets per day. Google produces billions of datasets per day (Goods paper, 2016). Spotify shows more modest figures, only hundreds of thousands (deduced from conference presentations 2018), but still 1000x more than a typical enterprise. How do they achieve this level of data efficiency? We have spent time with companies at both ends of the spectrum and seen patterns emerge. In this presentation, we will highlight the differences with seven patterns that seem to be crucial for data efficiency. We have also figured out how to achieve a comparable level of efficiency at a small scale, without huge investments, and will share our tricks of the trade. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/the7habitsofdataeffectivecompanies-221109222221-82a9b69c-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Are there 10x engineers? Unclear. But there are 1000x companies for sure. If we look at value delivery metrics, we can find spans of 1000x between leaders and companies lagging behind. The DORA research effort showed a 1000x span in availability metrics (see the State of DevOps report). Lego once built a virtual reality game, &quot;Lego Universe&quot;, comparable to Minecraft. They spent 10000x the cost before the game launch, compared to Mojang. We see similar huge efficiency spans in ability to extract value from data. Most data processing involves creation of datasets - refined data artifacts of business value, e.g. reports, recommendation indexes, or machine learning models. Mature companies with large traditional data warehouses typically create on the order of 100s of datasets per day. Google produces billions of datasets per day (Goods paper, 2016). Spotify shows more modest figures, only hundreds of thousands (deduced from conference presentations 2018), but still 1000x more than a typical enterprise. How do they achieve this level of data efficiency? We have spent time with companies at both ends of the spectrum and seen patterns emerge. In this presentation, we will highlight the differences with seven patterns that seem to be crucial for data efficiency. We have also figured out how to achieve a comparable level of efficiency at a small scale, without huge investments, and will share our tricks of the trade.
The 7 habits of data effective companies.pdf from Lars Albertsson
]]>
280 0 https://cdn.slidesharecdn.com/ss_thumbnails/the7habitsofdataeffectivecompanies-221109222221-82a9b69c-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Holistic data application quality /slideshow/holistic-data-application-quality-251734945/251734945 holisticdataapplicationquality-220507085537-11fdc826
The quality of data-powered applications depends not only on code, but also on collected data, as well as models trained on data. This renders traditional quality assurance inadequate. We will take a look in our toolbox for more holistic tactics that bridge the gap between code and data quality assurance.]]>

The quality of data-powered applications depends not only on code, but also on collected data, as well as models trained on data. This renders traditional quality assurance inadequate. We will take a look in our toolbox for more holistic tactics that bridge the gap between code and data quality assurance.]]>
Sat, 07 May 2022 08:55:36 GMT /slideshow/holistic-data-application-quality-251734945/251734945 lallea@slideshare.net(lallea) Holistic data application quality lallea The quality of data-powered applications depends not only on code, but also on collected data, as well as models trained on data. This renders traditional quality assurance inadequate. We will take a look in our toolbox for more holistic tactics that bridge the gap between code and data quality assurance. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/holisticdataapplicationquality-220507085537-11fdc826-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The quality of data-powered applications depends not only on code, but also on collected data, as well as models trained on data. This renders traditional quality assurance inadequate. We will take a look in our toolbox for more holistic tactics that bridge the gap between code and data quality assurance.
Holistic data application quality from Lars Albertsson
]]>
406 0 https://cdn.slidesharecdn.com/ss_thumbnails/holisticdataapplicationquality-220507085537-11fdc826-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Secure software supply chain on a shoestring budget /slideshow/secure-software-supply-chain-on-a-shoestring-budgetpdf/251718170 securesoftwaresupplychainonashoestringbudget-220504123453
The NotPetya, SolarWinds, and Kaseya cybersecurity attacks were all executed by injection of malicious code in software shipped by vendors to thousands of companies. These attacks have made the public more aware of the importance of secure software supply chains. But the path from awareness to ensuring a secure supply chain is long. Developers have gotten used to the convenience of easily downloading third party software into containers, and it is challenging to tighten supply chain security in a company with a sprawl of open source components. Scling is a small data engineering startup, and since we ask our customers to entrust us with their data, we must take security seriously. We have been securing our software supply chain since the company was founded. We have no venture capital, and our customers expect quick development iteration cycles, so we have solved supply chain security with minimal effort and minimal impact on developer productivity. In this presentation, we share how we have addressed the different supply chain attack vectors, e.g. Python and JVM packages, with technical solutions. We will present how we automate third party software upgrades to stay up to date with security upgrades while minimising the risk of downloading rogue code.]]>

The NotPetya, SolarWinds, and Kaseya cybersecurity attacks were all executed by injection of malicious code in software shipped by vendors to thousands of companies. These attacks have made the public more aware of the importance of secure software supply chains. But the path from awareness to ensuring a secure supply chain is long. Developers have gotten used to the convenience of easily downloading third party software into containers, and it is challenging to tighten supply chain security in a company with a sprawl of open source components. Scling is a small data engineering startup, and since we ask our customers to entrust us with their data, we must take security seriously. We have been securing our software supply chain since the company was founded. We have no venture capital, and our customers expect quick development iteration cycles, so we have solved supply chain security with minimal effort and minimal impact on developer productivity. In this presentation, we share how we have addressed the different supply chain attack vectors, e.g. Python and JVM packages, with technical solutions. We will present how we automate third party software upgrades to stay up to date with security upgrades while minimising the risk of downloading rogue code.]]>
Wed, 04 May 2022 12:34:53 GMT /slideshow/secure-software-supply-chain-on-a-shoestring-budgetpdf/251718170 lallea@slideshare.net(lallea) Secure software supply chain on a shoestring budget lallea The NotPetya, SolarWinds, and Kaseya cybersecurity attacks were all executed by injection of malicious code in software shipped by vendors to thousands of companies. These attacks have made the public more aware of the importance of secure software supply chains. But the path from awareness to ensuring a secure supply chain is long. Developers have gotten used to the convenience of easily downloading third party software into containers, and it is challenging to tighten supply chain security in a company with a sprawl of open source components. Scling is a small data engineering startup, and since we ask our customers to entrust us with their data, we must take security seriously. We have been securing our software supply chain since the company was founded. We have no venture capital, and our customers expect quick development iteration cycles, so we have solved supply chain security with minimal effort and minimal impact on developer productivity. In this presentation, we share how we have addressed the different supply chain attack vectors, e.g. Python and JVM packages, with technical solutions. We will present how we automate third party software upgrades to stay up to date with security upgrades while minimising the risk of downloading rogue code. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/securesoftwaresupplychainonashoestringbudget-220504123453-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> The NotPetya, SolarWinds, and Kaseya cybersecurity attacks were all executed by injection of malicious code in software shipped by vendors to thousands of companies. These attacks have made the public more aware of the importance of secure software supply chains. But the path from awareness to ensuring a secure supply chain is long. Developers have gotten used to the convenience of easily downloading third party software into containers, and it is challenging to tighten supply chain security in a company with a sprawl of open source components. Scling is a small data engineering startup, and since we ask our customers to entrust us with their data, we must take security seriously. We have been securing our software supply chain since the company was founded. We have no venture capital, and our customers expect quick development iteration cycles, so we have solved supply chain security with minimal effort and minimal impact on developer productivity. In this presentation, we share how we have addressed the different supply chain attack vectors, e.g. Python and JVM packages, with technical solutions. We will present how we automate third party software upgrades to stay up to date with security upgrades while minimising the risk of downloading rogue code.
Secure software supply chain on a shoestring budget from Lars Albertsson
]]>
293 0 https://cdn.slidesharecdn.com/ss_thumbnails/securesoftwaresupplychainonashoestringbudget-220504123453-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
DataOps - Lean principles and lean practices /slideshow/dataops-lean-principles-and-lean-practices/242582561 dataops-leanprinciplesandleanpractices-210211134745
DataOps is the transformation of data processing from a craft with manual processes to an automated data factory. Lean principles, which have proven successful in manufacturing, are equally applicable for data factories. We will describe how lean principles can be applied in practice for successful data processing.]]>

DataOps is the transformation of data processing from a craft with manual processes to an automated data factory. Lean principles, which have proven successful in manufacturing, are equally applicable for data factories. We will describe how lean principles can be applied in practice for successful data processing.]]>
Thu, 11 Feb 2021 13:47:45 GMT /slideshow/dataops-lean-principles-and-lean-practices/242582561 lallea@slideshare.net(lallea) DataOps - Lean principles and lean practices lallea DataOps is the transformation of data processing from a craft with manual processes to an automated data factory. Lean principles, which have proven successful in manufacturing, are equally applicable for data factories. We will describe how lean principles can be applied in practice for successful data processing. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dataops-leanprinciplesandleanpractices-210211134745-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> DataOps is the transformation of data processing from a craft with manual processes to an automated data factory. Lean principles, which have proven successful in manufacturing, are equally applicable for data factories. We will describe how lean principles can be applied in practice for successful data processing.
DataOps - Lean principles and lean practices from Lars Albertsson
]]>
837 0 https://cdn.slidesharecdn.com/ss_thumbnails/dataops-leanprinciplesandleanpractices-210211134745-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Ai legal and ethics /slideshow/ai-legal-and-ethics/240937918 ai-legalandethics-210105124132
ºÝºÝߣs presented at panel debate with AI professionals and attorneys.]]>

ºÝºÝߣs presented at panel debate with AI professionals and attorneys.]]>
Tue, 05 Jan 2021 12:41:32 GMT /slideshow/ai-legal-and-ethics/240937918 lallea@slideshare.net(lallea) Ai legal and ethics lallea ºÝºÝߣs presented at panel debate with AI professionals and attorneys. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/ai-legalandethics-210105124132-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> ºÝºÝߣs presented at panel debate with AI professionals and attorneys.
Ai legal and ethics from Lars Albertsson
]]>
213 1 https://cdn.slidesharecdn.com/ss_thumbnails/ai-legalandethics-210105124132-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
The right side of speed - learning to shift left /slideshow/the-right-side-of-speed-learning-to-shift-left/240937627 therightsideofspeed-learningtoshiftleft-210105122838
Many disciplines are on the wrong side of speed - there is a tradeoff with development speed and security, data science, compliance, etc. Let us look at disciplines that have succeeded in shifting left by integrating development, and learn successful patterns: testing, DevOps, agile, DataOps.]]>

Many disciplines are on the wrong side of speed - there is a tradeoff with development speed and security, data science, compliance, etc. Let us look at disciplines that have succeeded in shifting left by integrating development, and learn successful patterns: testing, DevOps, agile, DataOps.]]>
Tue, 05 Jan 2021 12:28:38 GMT /slideshow/the-right-side-of-speed-learning-to-shift-left/240937627 lallea@slideshare.net(lallea) The right side of speed - learning to shift left lallea Many disciplines are on the wrong side of speed - there is a tradeoff with development speed and security, data science, compliance, etc. Let us look at disciplines that have succeeded in shifting left by integrating development, and learn successful patterns: testing, DevOps, agile, DataOps. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/therightsideofspeed-learningtoshiftleft-210105122838-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Many disciplines are on the wrong side of speed - there is a tradeoff with development speed and security, data science, compliance, etc. Let us look at disciplines that have succeeded in shifting left by integrating development, and learn successful patterns: testing, DevOps, agile, DataOps.
The right side of speed - learning to shift left from Lars Albertsson
]]>
214 1 https://cdn.slidesharecdn.com/ss_thumbnails/therightsideofspeed-learningtoshiftleft-210105122838-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Mortal analytics - Covid-19 and the problem of data quality /slideshow/mortal-analytics-covid19-and-the-problem-of-data-quality/238427425 mortalanalytics-covid-19andtheproblemofdataquality-200909083048
Social media are full of Covid-19 graphs, each pointing to an "obvious" conclusion that fits the author's agenda. Unfortunately, even the official sources publish analytics that point at incorrect conclusions. Bad data quality has become a matter of life and death. We look at the quality problems with official Covid-19 data presentations. The problems are common in all domains, and solutions are known, but not widespread. We describe tools and patterns that data mature companies use to assess and improve data quality in similar situations. Mastering data quality and data operations is a prerequisite for building sustainable AI solutions, and we will explain how these patterns fit into machine learning product development.]]>

Social media are full of Covid-19 graphs, each pointing to an "obvious" conclusion that fits the author's agenda. Unfortunately, even the official sources publish analytics that point at incorrect conclusions. Bad data quality has become a matter of life and death. We look at the quality problems with official Covid-19 data presentations. The problems are common in all domains, and solutions are known, but not widespread. We describe tools and patterns that data mature companies use to assess and improve data quality in similar situations. Mastering data quality and data operations is a prerequisite for building sustainable AI solutions, and we will explain how these patterns fit into machine learning product development.]]>
Wed, 09 Sep 2020 08:30:48 GMT /slideshow/mortal-analytics-covid19-and-the-problem-of-data-quality/238427425 lallea@slideshare.net(lallea) Mortal analytics - Covid-19 and the problem of data quality lallea Social media are full of Covid-19 graphs, each pointing to an "obvious" conclusion that fits the author's agenda. Unfortunately, even the official sources publish analytics that point at incorrect conclusions. Bad data quality has become a matter of life and death. We look at the quality problems with official Covid-19 data presentations. The problems are common in all domains, and solutions are known, but not widespread. We describe tools and patterns that data mature companies use to assess and improve data quality in similar situations. Mastering data quality and data operations is a prerequisite for building sustainable AI solutions, and we will explain how these patterns fit into machine learning product development. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/mortalanalytics-covid-19andtheproblemofdataquality-200909083048-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Social media are full of Covid-19 graphs, each pointing to an &quot;obvious&quot; conclusion that fits the author&#39;s agenda. Unfortunately, even the official sources publish analytics that point at incorrect conclusions. Bad data quality has become a matter of life and death. We look at the quality problems with official Covid-19 data presentations. The problems are common in all domains, and solutions are known, but not widespread. We describe tools and patterns that data mature companies use to assess and improve data quality in similar situations. Mastering data quality and data operations is a prerequisite for building sustainable AI solutions, and we will explain how these patterns fit into machine learning product development.
Mortal analytics - Covid-19 and the problem of data quality from Lars Albertsson
]]>
425 1 https://cdn.slidesharecdn.com/ss_thumbnails/mortalanalytics-covid-19andtheproblemofdataquality-200909083048-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Data ops in practice - Swedish style /slideshow/data-ops-in-practice-swedish-style/238427186 dataopsinpractice-swedishstyle-200909081643
DataOps requires a cultural shift that brings the principles of lean manufacturing and DevOps to data analytics. It breaks down silos between developers, data scientists, and operators, resulting in rapid cycle times and low error rates. At Spotify in 2013, the concept of DataOps did not exist but the Swedish company needed a way to align the people, processes, and technologies of the data organization to accelerate the development of high-quality analytics. The result was a Swedish-style DataOps, influenced by Scandinavian culture and agile principles, that enabled the company to become a true data-driven leader.]]>

DataOps requires a cultural shift that brings the principles of lean manufacturing and DevOps to data analytics. It breaks down silos between developers, data scientists, and operators, resulting in rapid cycle times and low error rates. At Spotify in 2013, the concept of DataOps did not exist but the Swedish company needed a way to align the people, processes, and technologies of the data organization to accelerate the development of high-quality analytics. The result was a Swedish-style DataOps, influenced by Scandinavian culture and agile principles, that enabled the company to become a true data-driven leader.]]>
Wed, 09 Sep 2020 08:16:42 GMT /slideshow/data-ops-in-practice-swedish-style/238427186 lallea@slideshare.net(lallea) Data ops in practice - Swedish style lallea DataOps requires a cultural shift that brings the principles of lean manufacturing and DevOps to data analytics. It breaks down silos between developers, data scientists, and operators, resulting in rapid cycle times and low error rates. At Spotify in 2013, the concept of DataOps did not exist but the Swedish company needed a way to align the people, processes, and technologies of the data organization to accelerate the development of high-quality analytics. The result was a Swedish-style DataOps, influenced by Scandinavian culture and agile principles, that enabled the company to become a true data-driven leader. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/dataopsinpractice-swedishstyle-200909081643-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> DataOps requires a cultural shift that brings the principles of lean manufacturing and DevOps to data analytics. It breaks down silos between developers, data scientists, and operators, resulting in rapid cycle times and low error rates. At Spotify in 2013, the concept of DataOps did not exist but the Swedish company needed a way to align the people, processes, and technologies of the data organization to accelerate the development of high-quality analytics. The result was a Swedish-style DataOps, influenced by Scandinavian culture and agile principles, that enabled the company to become a true data-driven leader.
Data ops in practice - Swedish style from Lars Albertsson
]]>
443 1 https://cdn.slidesharecdn.com/ss_thumbnails/dataopsinpractice-swedishstyle-200909081643-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
The lean principles of data ops /slideshow/the-lean-principles-of-data-ops/237279871 theleanprinciplesofdataops-200727083710
Modern data processing environments resemble factory lines, transforming raw data to valuable data products. The lean principles that have successfully transformed manufacturing are equally applicable to data processing, and are well aligned with the new trend known as DataOps. In this presentation, we will explain how applying lean and DataOps principles can be implemented as technical data processing solutions and processes in order to eliminate waste and improve data innovation speed. We will go through how to eliminate the following types of waste in data processing systems: * Cognitive waste - unclear source of truth, dependency sprawl, duplication, ambiguity. * Operational waste - overhead for deployment, upgrades, and incident recovery. * Delivery waste - friction and delay in development, testing, and deployment. * Product waste - misalignment to business value, detach from use cases, push driven development, vanity quality assurance. We will primarily focus on technical solutions, but some of the waste mentioned requires organisational refactoring to eliminate. ]]>

Modern data processing environments resemble factory lines, transforming raw data to valuable data products. The lean principles that have successfully transformed manufacturing are equally applicable to data processing, and are well aligned with the new trend known as DataOps. In this presentation, we will explain how applying lean and DataOps principles can be implemented as technical data processing solutions and processes in order to eliminate waste and improve data innovation speed. We will go through how to eliminate the following types of waste in data processing systems: * Cognitive waste - unclear source of truth, dependency sprawl, duplication, ambiguity. * Operational waste - overhead for deployment, upgrades, and incident recovery. * Delivery waste - friction and delay in development, testing, and deployment. * Product waste - misalignment to business value, detach from use cases, push driven development, vanity quality assurance. We will primarily focus on technical solutions, but some of the waste mentioned requires organisational refactoring to eliminate. ]]>
Mon, 27 Jul 2020 08:37:10 GMT /slideshow/the-lean-principles-of-data-ops/237279871 lallea@slideshare.net(lallea) The lean principles of data ops lallea Modern data processing environments resemble factory lines, transforming raw data to valuable data products. The lean principles that have successfully transformed manufacturing are equally applicable to data processing, and are well aligned with the new trend known as DataOps. In this presentation, we will explain how applying lean and DataOps principles can be implemented as technical data processing solutions and processes in order to eliminate waste and improve data innovation speed. We will go through how to eliminate the following types of waste in data processing systems: * Cognitive waste - unclear source of truth, dependency sprawl, duplication, ambiguity. * Operational waste - overhead for deployment, upgrades, and incident recovery. * Delivery waste - friction and delay in development, testing, and deployment. * Product waste - misalignment to business value, detach from use cases, push driven development, vanity quality assurance. We will primarily focus on technical solutions, but some of the waste mentioned requires organisational refactoring to eliminate. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/theleanprinciplesofdataops-200727083710-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> Modern data processing environments resemble factory lines, transforming raw data to valuable data products. The lean principles that have successfully transformed manufacturing are equally applicable to data processing, and are well aligned with the new trend known as DataOps. In this presentation, we will explain how applying lean and DataOps principles can be implemented as technical data processing solutions and processes in order to eliminate waste and improve data innovation speed. We will go through how to eliminate the following types of waste in data processing systems: * Cognitive waste - unclear source of truth, dependency sprawl, duplication, ambiguity. * Operational waste - overhead for deployment, upgrades, and incident recovery. * Delivery waste - friction and delay in development, testing, and deployment. * Product waste - misalignment to business value, detach from use cases, push driven development, vanity quality assurance. We will primarily focus on technical solutions, but some of the waste mentioned requires organisational refactoring to eliminate.
The lean principles of data ops from Lars Albertsson
]]>
436 1 https://cdn.slidesharecdn.com/ss_thumbnails/theleanprinciplesofdataops-200727083710-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
Data democratised /slideshow/data-democratised/204419677 datademocratised-191211123405
New times, new hype. Buzzwords like big data and Hadoop have been changed to AI and machine learning. But it's not technology, old or new, nor machine learning that separates companies that get value from data from the companies that struggle . When big data was at its peak, several young, technology-intensive companies succeeded in absorbing big data successfully. They acquired large Hadoop clusters, learned to master data and created valuable products with machine learning. However, big data has had a limited impact at traditional companies, and the list of long and expensive data lake and Hadoop projects is long. The key to implementing successful projects that transform data into business value is to democratise data - making it accessible and easy to use within an organisation.]]>

New times, new hype. Buzzwords like big data and Hadoop have been changed to AI and machine learning. But it's not technology, old or new, nor machine learning that separates companies that get value from data from the companies that struggle . When big data was at its peak, several young, technology-intensive companies succeeded in absorbing big data successfully. They acquired large Hadoop clusters, learned to master data and created valuable products with machine learning. However, big data has had a limited impact at traditional companies, and the list of long and expensive data lake and Hadoop projects is long. The key to implementing successful projects that transform data into business value is to democratise data - making it accessible and easy to use within an organisation.]]>
Wed, 11 Dec 2019 12:34:05 GMT /slideshow/data-democratised/204419677 lallea@slideshare.net(lallea) Data democratised lallea New times, new hype. Buzzwords like big data and Hadoop have been changed to AI and machine learning. But it's not technology, old or new, nor machine learning that separates companies that get value from data from the companies that struggle . When big data was at its peak, several young, technology-intensive companies succeeded in absorbing big data successfully. They acquired large Hadoop clusters, learned to master data and created valuable products with machine learning. However, big data has had a limited impact at traditional companies, and the list of long and expensive data lake and Hadoop projects is long. The key to implementing successful projects that transform data into business value is to democratise data - making it accessible and easy to use within an organisation. <img style="border:1px solid #C3E6D8;float:right;" alt="" src="https://cdn.slidesharecdn.com/ss_thumbnails/datademocratised-191211123405-thumbnail.jpg?width=120&amp;height=120&amp;fit=bounds" /><br> New times, new hype. Buzzwords like big data and Hadoop have been changed to AI and machine learning. But it&#39;s not technology, old or new, nor machine learning that separates companies that get value from data from the companies that struggle . When big data was at its peak, several young, technology-intensive companies succeeded in absorbing big data successfully. They acquired large Hadoop clusters, learned to master data and created valuable products with machine learning. However, big data has had a limited impact at traditional companies, and the list of long and expensive data lake and Hadoop projects is long. The key to implementing successful projects that transform data into business value is to democratise data - making it accessible and easy to use within an organisation.
Data democratised from Lars Albertsson
]]>
311 2 https://cdn.slidesharecdn.com/ss_thumbnails/datademocratised-191211123405-thumbnail.jpg?width=120&height=120&fit=bounds presentation Black http://activitystrea.ms/schema/1.0/post http://activitystrea.ms/schema/1.0/posted 0
https://cdn.slidesharecdn.com/profile-photo-lallea-48x48.jpg?cb=1750180742 Founder of Scling, offering data-value-as-a-service - a partnership solution for extracting value from data. www.scling.com https://cdn.slidesharecdn.com/ss_thumbnails/allthedataopsalltheparadigms-250617171934-55b90e8e-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/all-the-dataops-all-the-paradigms/280671667 All the DataOps, all t... https://cdn.slidesharecdn.com/ss_thumbnails/generativeai-thepowertodestroydemocracymeetsthesecurityandreliabilityofms-dos-241025184532-0275b2de-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/generative-ai-the-power-to-destroy-democracy-meets-the-security-and-reliability-of-ms-dos-pdf/272734264 Generative AI - the po... https://cdn.slidesharecdn.com/ss_thumbnails/theroadtopragmaticapplicationofai-240825114806-c2071cb2-thumbnail.jpg?width=320&height=320&fit=bounds slideshow/the-road-to-pragmatic-application-of-ai-pdf/271291004 The road to pragmatic ...