ºÝºÝߣ

ºÝºÝߣShare a Scribd company logo
Sri Krishnamurthy
Fall 2023
Projects
Project 1
PDF Summary Tool: Students are tasked with creating
a Streamlit app that can summarize PDF documents.
They must choose between using nougat or pypdf
libraries to process PDFs from the SEC. The app should
allow users to select the library, test various PDFs,
provide pros/cons for each tool, and recommend one.
Additionally, students need to create and integrate
architectural diagrams of the project within Streamlit.
Data Quality Evaluation Tool: This part involves
building a Streamlit tool using the Freddie Mac single-
family dataset. The tool, designed for data quality
evaluation, should allow users to upload CSV/XLS
fi
les
and specify their type (Origination/Monthly
Performance). The tool will use pandaspro
fi
ling to
summarize data and greatexpectations to validate data
schema, integrity, and completeness. Architecture
Tools Used
• Streamlit
• Nougat or PyPDF libraries
• pandas-profiling
• greatexpectations
• Diagrams tool for architecture
Data Engineering and building tools to summarize
SEC and Freddie Mac datasets
Project 2
A system using Large Language Models to
summarize PDF documents from the SEC website.
The project for the Big Data and Intelligent Analytics
graduate course, as detailed in Assignment 2, involves
developing a tool for analysts to load PDF documents
and obtain summaries.
The project includes evaluating nougat and pypdf
libraries for processing PDFs from the SEC, replicating
a demo from the Open AI cookbook, and creating
Jupyter notebooks that can handle SEC PDF
documents. Additionally, students are tasked with
designing fast APIs for a Streamlit app, updating the
app with new functionalities, and revising design
documents and architectural diagrams to re
fl
ect the
updates.
Tools Used
• Streamlit
• Nougat or PyPDF libraries
• FAST API
• OPENAI APIs
• greatexpectations
• Diagrams tool for architecture
visualization
This project focused on
automating the creation of
embeddings and populating a
vector database. Key components
include:
Automating Embedding
Creation and Database
Population:
Air
fl
ow Pipelines: Two distinct
Air
fl
ow pipelines for data
acquisition, embedding
generation, and inserting records
into Pinecone vector database
using SEC PDF
fi
les.
Data Processing and Validation:
Implement data validation,
generate embeddings, and save
fi
le extracts.
Client-Facing Application
Development:
FastAPI and Streamlit: Develop
a user registration and login
system with JWT authentication.
Utilize a SQL database for storing
user credentials and application
logs.
Streamlit for User Interface:
Create a secure login page, a
question-answering interface, and
implement a search mechanism
using Pinecone vector database.
Deployment: Containerize each
microservice and deploy on a
public cloud platform.
Project 3
Using LLMs and RAG for document summarization of
SEC documents
Tools Used
• Airflow
• Pinecone
• FastAPI
• JWT (JSON Web Token)
• SQL Database
• Streamlit
• Docker for containerization
Project 4
Using LLMs to interact with
Snowflake using natural language
Data Engineering with Snowpark Python: Students
individually reproduce steps in creating data pipelines
with Snowpark Python, showcasing their work in a
forked repository.
Dataset Analysis: Teams select datasets from
Snow
fl
ake's marketplace, creating thematic stories and
Proof of Concept (POC) to address speci
fi
c problems.
They design architectural diagrams and implement SQL
processes and User-De
fi
ned Functions, integrating Git
actions for deployment.
Streamlit and OpenAI Integration: The project
involves connecting Snow
fl
ake with Streamlit for
analytics, developing a text-based SQL query feature
using natural language processing, and integrating
OpenAI services for query generation and re
fi
nement.
Tools Used
• Snowpark Python
• Snowflake Marketplace
• Streamlit
• OpenAI Services
• SQL Database Management
The project involves a thorough review
of the existing architecture
(Assignment 3) and its redesign using
two distinct approaches:
Open Source Components: Utilizing
primarily open-source tools like
Huggingface, LLAMA from Meta,
Amazon Bedrock, etc. The focus is on
creating a
fl
exible and customizable
stack that aligns with the dynamic
needs of the enterprise.
Enterprise Alternatives to OpenAI
Stack: Incorporating enterprise
solutions such as Google Bard,
Anthropic, Cohere, Perplexity, etc. This
approach is geared towards leveraging
the robust and reliable frameworks
o
ff
ered by leading tech organizations.
Architecture Design: Both use cases
will have detailed architecture
diagrams showcasing preparation
pipelines and inference aspects.
A comparison of the technologies in
terms of hosting and as-a-service
capabilities.
Technology Suitability Analysis:
Justi
fi
cation of selected technologies
based on application suitability.
Evaluation of scalability, reliability,
and performance metrics.
Cost Analysis: Detailed breakdown of
fi
xed and variable costs for both
architectures.
Analysis includes hosting, annual
licenses, maintenance, API access,
and use-case speci
fi
c costs (e.g.,
PDF processing).
Comparative study of cost
structures between the original and
new architectures.
Project 5
Project redesign and rearchitecture
Tools Used
Huggingface: For machine learning and natural language
processing tasks.
LLAMA from Meta: A language model for various analytical
tasks.
Amazon Bedrock: For data management and analytics
infrastructure.
Enterprise Components:
Google Bard: AI-driven data analysis and predictive
modeling.
Anthropic: Advanced AI solutions for complex data tasks.
Cohere: Provides tools for natural language understanding.

More Related Content

Similar to Big Data projects.pdf (20)

Juan Baquera
Juan BaqueraJuan Baquera
Juan Baquera
Juan Baquera
Ìý
Supreet Resume
Supreet ResumeSupreet Resume
Supreet Resume
supreet khurana
Ìý
Case study for communication social portal with share point implementation
Case study for communication social portal with share point implementationCase study for communication social portal with share point implementation
Case study for communication social portal with share point implementation
Mike Taylor
Ìý
peeyush_resume
peeyush_resumepeeyush_resume
peeyush_resume
Peeyush Pandey
Ìý
Aman kaur gandhi
Aman kaur gandhiAman kaur gandhi
Aman kaur gandhi
Aman Kaur Gandhi
Ìý
Aman kaur gandhi
Aman kaur gandhiAman kaur gandhi
Aman kaur gandhi
Aman Kaur Gandhi
Ìý
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
Ìý
Portfolio
PortfolioPortfolio
Portfolio
jeanux
Ìý
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
Ìý
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
Ìý
VINOD_6yrs
VINOD_6yrsVINOD_6yrs
VINOD_6yrs
Kona Kumar
Ìý
Zakir_Hussain_cv
Zakir_Hussain_cvZakir_Hussain_cv
Zakir_Hussain_cv
zakir hussain
Ìý
CustomerCopy
CustomerCopyCustomerCopy
CustomerCopy
mohit behl
Ìý
ZakirHussain
ZakirHussainZakirHussain
ZakirHussain
zakir hussain
Ìý
Resume_Md ZakirHussain
Resume_Md ZakirHussainResume_Md ZakirHussain
Resume_Md ZakirHussain
zakir hussain
Ìý
Shabarish kesa resume_new
Shabarish kesa resume_newShabarish kesa resume_new
Shabarish kesa resume_new
shabarish shabbi
Ìý
Sam segal resume
Sam segal resumeSam segal resume
Sam segal resume
samuel segal
Ìý
Resume of Md Sajedul Islam
Resume of Md Sajedul IslamResume of Md Sajedul Islam
Resume of Md Sajedul Islam
sajedulislam
Ìý
Resume of Md Sajedul Islam
Resume of Md Sajedul IslamResume of Md Sajedul Islam
Resume of Md Sajedul Islam
sajedulislam
Ìý
JCommerce – success stories
JCommerce – success storiesJCommerce – success stories
JCommerce – success stories
JCommerce
Ìý
Juan Baquera
Juan BaqueraJuan Baquera
Juan Baquera
Juan Baquera
Ìý
Case study for communication social portal with share point implementation
Case study for communication social portal with share point implementationCase study for communication social portal with share point implementation
Case study for communication social portal with share point implementation
Mike Taylor
Ìý
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
Ìý
Portfolio
PortfolioPortfolio
Portfolio
jeanux
Ìý
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise NecessityApache Arrow: Open Source Standard Becomes an Enterprise Necessity
Apache Arrow: Open Source Standard Becomes an Enterprise Necessity
Wes McKinney
Ìý
Solving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache ArrowSolving Enterprise Data Challenges with Apache Arrow
Solving Enterprise Data Challenges with Apache Arrow
Wes McKinney
Ìý
VINOD_6yrs
VINOD_6yrsVINOD_6yrs
VINOD_6yrs
Kona Kumar
Ìý
Zakir_Hussain_cv
Zakir_Hussain_cvZakir_Hussain_cv
Zakir_Hussain_cv
zakir hussain
Ìý
CustomerCopy
CustomerCopyCustomerCopy
CustomerCopy
mohit behl
Ìý
Resume_Md ZakirHussain
Resume_Md ZakirHussainResume_Md ZakirHussain
Resume_Md ZakirHussain
zakir hussain
Ìý
Shabarish kesa resume_new
Shabarish kesa resume_newShabarish kesa resume_new
Shabarish kesa resume_new
shabarish shabbi
Ìý
Sam segal resume
Sam segal resumeSam segal resume
Sam segal resume
samuel segal
Ìý
Resume of Md Sajedul Islam
Resume of Md Sajedul IslamResume of Md Sajedul Islam
Resume of Md Sajedul Islam
sajedulislam
Ìý
Resume of Md Sajedul Islam
Resume of Md Sajedul IslamResume of Md Sajedul Islam
Resume of Md Sajedul Islam
sajedulislam
Ìý
JCommerce – success stories
JCommerce – success storiesJCommerce – success stories
JCommerce – success stories
JCommerce
Ìý

Recently uploaded (20)

Taykon-Kalite belgeleri
Taykon-Kalite belgeleriTaykon-Kalite belgeleri
Taykon-Kalite belgeleri
TAYKON
Ìý
Design of cannal by Kennedy Theory full problem solved
Design of cannal by Kennedy Theory full problem solvedDesign of cannal by Kennedy Theory full problem solved
Design of cannal by Kennedy Theory full problem solved
Er. Gurmeet Singh
Ìý
Sppu engineering artificial intelligence and data science semester 6th Artif...
Sppu engineering  artificial intelligence and data science semester 6th Artif...Sppu engineering  artificial intelligence and data science semester 6th Artif...
Sppu engineering artificial intelligence and data science semester 6th Artif...
pawaletrupti434
Ìý
Machine Vision lecture notes for Unit 3.ppt
Machine Vision lecture notes for Unit 3.pptMachine Vision lecture notes for Unit 3.ppt
Machine Vision lecture notes for Unit 3.ppt
SATHISHKUMARSD1
Ìý
GE 6B GT Ratcheting Animation- Hemananda Chinara.ppsx
GE 6B GT Ratcheting Animation- Hemananda Chinara.ppsxGE 6B GT Ratcheting Animation- Hemananda Chinara.ppsx
GE 6B GT Ratcheting Animation- Hemananda Chinara.ppsx
Hemananda Chinara
Ìý
AIR FILTER system in internal combustion engine system.ppt
AIR FILTER system in internal combustion engine system.pptAIR FILTER system in internal combustion engine system.ppt
AIR FILTER system in internal combustion engine system.ppt
thisisparthipan1
Ìý
Wireless-Charger presentation for seminar .pdf
Wireless-Charger presentation for seminar .pdfWireless-Charger presentation for seminar .pdf
Wireless-Charger presentation for seminar .pdf
AbhinandanMishra30
Ìý
Dilatometer Test in Geotechnical engineering an over view .pptx
Dilatometer Test in Geotechnical engineering an over view .pptxDilatometer Test in Geotechnical engineering an over view .pptx
Dilatometer Test in Geotechnical engineering an over view .pptx
RaghuramChallaC011
Ìý
Environmental Product Declaration - Uni Bell
Environmental Product Declaration - Uni BellEnvironmental Product Declaration - Uni Bell
Environmental Product Declaration - Uni Bell
ManishPatel169454
Ìý
GREEN BULIDING PPT FOR THE REFRENACE.PPT
GREEN BULIDING PPT FOR THE REFRENACE.PPTGREEN BULIDING PPT FOR THE REFRENACE.PPT
GREEN BULIDING PPT FOR THE REFRENACE.PPT
kamalkeerthan61
Ìý
Von karman Equation full derivation .pdf
Von karman Equation full derivation  .pdfVon karman Equation full derivation  .pdf
Von karman Equation full derivation .pdf
Er. Gurmeet Singh
Ìý
INTERNET OF THINGSSSSSSSSSSSSSSSSSSSSSSSSS.pptx
INTERNET OF THINGSSSSSSSSSSSSSSSSSSSSSSSSS.pptxINTERNET OF THINGSSSSSSSSSSSSSSSSSSSSSSSSS.pptx
INTERNET OF THINGSSSSSSSSSSSSSSSSSSSSSSSSS.pptx
bmit1
Ìý
Indian Soil Classification System in Geotechnical Engineering
Indian Soil Classification System in Geotechnical EngineeringIndian Soil Classification System in Geotechnical Engineering
Indian Soil Classification System in Geotechnical Engineering
Rajani Vyawahare
Ìý
Unit 1- Review of Basic Concepts-part 1.pptx
Unit 1- Review of Basic Concepts-part 1.pptxUnit 1- Review of Basic Concepts-part 1.pptx
Unit 1- Review of Basic Concepts-part 1.pptx
SujataSonawane11
Ìý
Designing Flex and Rigid-Flex PCBs to Prevent Failure
Designing Flex and Rigid-Flex PCBs to Prevent FailureDesigning Flex and Rigid-Flex PCBs to Prevent Failure
Designing Flex and Rigid-Flex PCBs to Prevent Failure
Epec Engineered Technologies
Ìý
AO Star Algorithm in Artificial Intellligence
AO Star Algorithm in Artificial IntellligenceAO Star Algorithm in Artificial Intellligence
AO Star Algorithm in Artificial Intellligence
vipulkondekar
Ìý
INVESTIGATION OF PUEA IN COGNITIVE RADIO NETWORKS USING ENERGY DETECTION IN D...
INVESTIGATION OF PUEA IN COGNITIVE RADIO NETWORKS USING ENERGY DETECTION IN D...INVESTIGATION OF PUEA IN COGNITIVE RADIO NETWORKS USING ENERGY DETECTION IN D...
INVESTIGATION OF PUEA IN COGNITIVE RADIO NETWORKS USING ENERGY DETECTION IN D...
csijjournal
Ìý
Cloud Cost Optimization for GCP, AWS, Azure
Cloud Cost Optimization for GCP, AWS, AzureCloud Cost Optimization for GCP, AWS, Azure
Cloud Cost Optimization for GCP, AWS, Azure
vinothsk19
Ìý
Improving Surgical Robot Performance Through Seal Design.pdf
Improving Surgical Robot Performance Through Seal Design.pdfImproving Surgical Robot Performance Through Seal Design.pdf
Improving Surgical Robot Performance Through Seal Design.pdf
BSEmarketing
Ìý
PPt physics -GD.pptx gd topic for physics btech
PPt physics -GD.pptx gd topic for physics btechPPt physics -GD.pptx gd topic for physics btech
PPt physics -GD.pptx gd topic for physics btech
kavyamittal2201735
Ìý
Taykon-Kalite belgeleri
Taykon-Kalite belgeleriTaykon-Kalite belgeleri
Taykon-Kalite belgeleri
TAYKON
Ìý
Design of cannal by Kennedy Theory full problem solved
Design of cannal by Kennedy Theory full problem solvedDesign of cannal by Kennedy Theory full problem solved
Design of cannal by Kennedy Theory full problem solved
Er. Gurmeet Singh
Ìý
Sppu engineering artificial intelligence and data science semester 6th Artif...
Sppu engineering  artificial intelligence and data science semester 6th Artif...Sppu engineering  artificial intelligence and data science semester 6th Artif...
Sppu engineering artificial intelligence and data science semester 6th Artif...
pawaletrupti434
Ìý
Machine Vision lecture notes for Unit 3.ppt
Machine Vision lecture notes for Unit 3.pptMachine Vision lecture notes for Unit 3.ppt
Machine Vision lecture notes for Unit 3.ppt
SATHISHKUMARSD1
Ìý
GE 6B GT Ratcheting Animation- Hemananda Chinara.ppsx
GE 6B GT Ratcheting Animation- Hemananda Chinara.ppsxGE 6B GT Ratcheting Animation- Hemananda Chinara.ppsx
GE 6B GT Ratcheting Animation- Hemananda Chinara.ppsx
Hemananda Chinara
Ìý
AIR FILTER system in internal combustion engine system.ppt
AIR FILTER system in internal combustion engine system.pptAIR FILTER system in internal combustion engine system.ppt
AIR FILTER system in internal combustion engine system.ppt
thisisparthipan1
Ìý
Wireless-Charger presentation for seminar .pdf
Wireless-Charger presentation for seminar .pdfWireless-Charger presentation for seminar .pdf
Wireless-Charger presentation for seminar .pdf
AbhinandanMishra30
Ìý
Dilatometer Test in Geotechnical engineering an over view .pptx
Dilatometer Test in Geotechnical engineering an over view .pptxDilatometer Test in Geotechnical engineering an over view .pptx
Dilatometer Test in Geotechnical engineering an over view .pptx
RaghuramChallaC011
Ìý
Environmental Product Declaration - Uni Bell
Environmental Product Declaration - Uni BellEnvironmental Product Declaration - Uni Bell
Environmental Product Declaration - Uni Bell
ManishPatel169454
Ìý
GREEN BULIDING PPT FOR THE REFRENACE.PPT
GREEN BULIDING PPT FOR THE REFRENACE.PPTGREEN BULIDING PPT FOR THE REFRENACE.PPT
GREEN BULIDING PPT FOR THE REFRENACE.PPT
kamalkeerthan61
Ìý
Von karman Equation full derivation .pdf
Von karman Equation full derivation  .pdfVon karman Equation full derivation  .pdf
Von karman Equation full derivation .pdf
Er. Gurmeet Singh
Ìý
INTERNET OF THINGSSSSSSSSSSSSSSSSSSSSSSSSS.pptx
INTERNET OF THINGSSSSSSSSSSSSSSSSSSSSSSSSS.pptxINTERNET OF THINGSSSSSSSSSSSSSSSSSSSSSSSSS.pptx
INTERNET OF THINGSSSSSSSSSSSSSSSSSSSSSSSSS.pptx
bmit1
Ìý
Indian Soil Classification System in Geotechnical Engineering
Indian Soil Classification System in Geotechnical EngineeringIndian Soil Classification System in Geotechnical Engineering
Indian Soil Classification System in Geotechnical Engineering
Rajani Vyawahare
Ìý
Unit 1- Review of Basic Concepts-part 1.pptx
Unit 1- Review of Basic Concepts-part 1.pptxUnit 1- Review of Basic Concepts-part 1.pptx
Unit 1- Review of Basic Concepts-part 1.pptx
SujataSonawane11
Ìý
Designing Flex and Rigid-Flex PCBs to Prevent Failure
Designing Flex and Rigid-Flex PCBs to Prevent FailureDesigning Flex and Rigid-Flex PCBs to Prevent Failure
Designing Flex and Rigid-Flex PCBs to Prevent Failure
Epec Engineered Technologies
Ìý
AO Star Algorithm in Artificial Intellligence
AO Star Algorithm in Artificial IntellligenceAO Star Algorithm in Artificial Intellligence
AO Star Algorithm in Artificial Intellligence
vipulkondekar
Ìý
INVESTIGATION OF PUEA IN COGNITIVE RADIO NETWORKS USING ENERGY DETECTION IN D...
INVESTIGATION OF PUEA IN COGNITIVE RADIO NETWORKS USING ENERGY DETECTION IN D...INVESTIGATION OF PUEA IN COGNITIVE RADIO NETWORKS USING ENERGY DETECTION IN D...
INVESTIGATION OF PUEA IN COGNITIVE RADIO NETWORKS USING ENERGY DETECTION IN D...
csijjournal
Ìý
Cloud Cost Optimization for GCP, AWS, Azure
Cloud Cost Optimization for GCP, AWS, AzureCloud Cost Optimization for GCP, AWS, Azure
Cloud Cost Optimization for GCP, AWS, Azure
vinothsk19
Ìý
Improving Surgical Robot Performance Through Seal Design.pdf
Improving Surgical Robot Performance Through Seal Design.pdfImproving Surgical Robot Performance Through Seal Design.pdf
Improving Surgical Robot Performance Through Seal Design.pdf
BSEmarketing
Ìý
PPt physics -GD.pptx gd topic for physics btech
PPt physics -GD.pptx gd topic for physics btechPPt physics -GD.pptx gd topic for physics btech
PPt physics -GD.pptx gd topic for physics btech
kavyamittal2201735
Ìý

Big Data projects.pdf

  • 2. Project 1 PDF Summary Tool: Students are tasked with creating a Streamlit app that can summarize PDF documents. They must choose between using nougat or pypdf libraries to process PDFs from the SEC. The app should allow users to select the library, test various PDFs, provide pros/cons for each tool, and recommend one. Additionally, students need to create and integrate architectural diagrams of the project within Streamlit. Data Quality Evaluation Tool: This part involves building a Streamlit tool using the Freddie Mac single- family dataset. The tool, designed for data quality evaluation, should allow users to upload CSV/XLS fi les and specify their type (Origination/Monthly Performance). The tool will use pandaspro fi ling to summarize data and greatexpectations to validate data schema, integrity, and completeness. Architecture Tools Used • Streamlit • Nougat or PyPDF libraries • pandas-profiling • greatexpectations • Diagrams tool for architecture Data Engineering and building tools to summarize SEC and Freddie Mac datasets
  • 3. Project 2 A system using Large Language Models to summarize PDF documents from the SEC website. The project for the Big Data and Intelligent Analytics graduate course, as detailed in Assignment 2, involves developing a tool for analysts to load PDF documents and obtain summaries. The project includes evaluating nougat and pypdf libraries for processing PDFs from the SEC, replicating a demo from the Open AI cookbook, and creating Jupyter notebooks that can handle SEC PDF documents. Additionally, students are tasked with designing fast APIs for a Streamlit app, updating the app with new functionalities, and revising design documents and architectural diagrams to re fl ect the updates. Tools Used • Streamlit • Nougat or PyPDF libraries • FAST API • OPENAI APIs • greatexpectations • Diagrams tool for architecture visualization
  • 4. This project focused on automating the creation of embeddings and populating a vector database. Key components include: Automating Embedding Creation and Database Population: Air fl ow Pipelines: Two distinct Air fl ow pipelines for data acquisition, embedding generation, and inserting records into Pinecone vector database using SEC PDF fi les. Data Processing and Validation: Implement data validation, generate embeddings, and save fi le extracts. Client-Facing Application Development: FastAPI and Streamlit: Develop a user registration and login system with JWT authentication. Utilize a SQL database for storing user credentials and application logs. Streamlit for User Interface: Create a secure login page, a question-answering interface, and implement a search mechanism using Pinecone vector database. Deployment: Containerize each microservice and deploy on a public cloud platform. Project 3 Using LLMs and RAG for document summarization of SEC documents Tools Used • Airflow • Pinecone • FastAPI • JWT (JSON Web Token) • SQL Database • Streamlit • Docker for containerization
  • 5. Project 4 Using LLMs to interact with Snowflake using natural language Data Engineering with Snowpark Python: Students individually reproduce steps in creating data pipelines with Snowpark Python, showcasing their work in a forked repository. Dataset Analysis: Teams select datasets from Snow fl ake's marketplace, creating thematic stories and Proof of Concept (POC) to address speci fi c problems. They design architectural diagrams and implement SQL processes and User-De fi ned Functions, integrating Git actions for deployment. Streamlit and OpenAI Integration: The project involves connecting Snow fl ake with Streamlit for analytics, developing a text-based SQL query feature using natural language processing, and integrating OpenAI services for query generation and re fi nement. Tools Used • Snowpark Python • Snowflake Marketplace • Streamlit • OpenAI Services • SQL Database Management
  • 6. The project involves a thorough review of the existing architecture (Assignment 3) and its redesign using two distinct approaches: Open Source Components: Utilizing primarily open-source tools like Huggingface, LLAMA from Meta, Amazon Bedrock, etc. The focus is on creating a fl exible and customizable stack that aligns with the dynamic needs of the enterprise. Enterprise Alternatives to OpenAI Stack: Incorporating enterprise solutions such as Google Bard, Anthropic, Cohere, Perplexity, etc. This approach is geared towards leveraging the robust and reliable frameworks o ff ered by leading tech organizations. Architecture Design: Both use cases will have detailed architecture diagrams showcasing preparation pipelines and inference aspects. A comparison of the technologies in terms of hosting and as-a-service capabilities. Technology Suitability Analysis: Justi fi cation of selected technologies based on application suitability. Evaluation of scalability, reliability, and performance metrics. Cost Analysis: Detailed breakdown of fi xed and variable costs for both architectures. Analysis includes hosting, annual licenses, maintenance, API access, and use-case speci fi c costs (e.g., PDF processing). Comparative study of cost structures between the original and new architectures. Project 5 Project redesign and rearchitecture Tools Used Huggingface: For machine learning and natural language processing tasks. LLAMA from Meta: A language model for various analytical tasks. Amazon Bedrock: For data management and analytics infrastructure. Enterprise Components: Google Bard: AI-driven data analysis and predictive modeling. Anthropic: Advanced AI solutions for complex data tasks. Cohere: Provides tools for natural language understanding.