A Power BI presentation with focus on how to tune your Data Model in order to make the Q & A feature as natural and reliable as possible. These are the slides used din my SharePoint Connect session in Amsterdam 18-20 Nov 1914
1 of 26
Downloaded 13 times
More Related Content
Power BI: Asking questions the natural way using Q & A
2. Power BI: Asking questions the natural way
using Q + A
Bjoern H Rapp
3. About me
Business Developer at Steria Twitter: @bjoern_rapp
SharePoint MVP and MCT Blog: www.sharepointviking.com
10 + years in system dev Email: safetox@hotmail.com
4. AGENDA
Power BI Primer
Q & A
Building Models
Synonyms
Behind the scenes
7. POWER BI familiarity of Office and the power of the cloud
EXCEL 2013 POWER BI for Office 365
Discover Analyze Visualize Share Find Mobile
8. Building Models - Modeling Tips
Fixing Missing Relationships
Renaming Tables and Columns
Choose a default Field Set for each table
Choose a default Label for each table
Fix incorrect data types
Mark year and id columns as Do not Summarize
Choose a data category for each data and geo col
Choose a sort by column for relevant column
20. 1.
Behind the scenes Choosing the best answer
Search for interpretations of
questions
Score and select best interpretation
Select best visual
21. 1.
medal count by country for 2012
Search for interpretations of
questions
Score and select best interpretation
Select best visual
from Olympics.xlxs
show medal[medal count], medal[country]
Where Olympics[year] = 2012
from Olympics.xlxs
show medal[medal count], Olympics[host country]
where olympics[year] = 2012
from Sales.xlxs
show customer[country]
where order[order year] = 2012
22. 1.
medal count by country for 2012
Search for interpretations of
questions
Score and select best interpretation
Select best visual
from Olympics.xlxs
show medal[medal count], medal[country]
Where Olympics[year] = 2012
from Olympics.xlxs
show medal[medal count], Olympics[host country]
where olympics[year] = 2012
from Sales.xlxs
show customer[country]
where order[order year] = 2012
0.86
0.72
0.46
23. 1.
medal count by country for 2012
Search for interpretations of
questions
Score and select best interpretation
Select best visual
from Olympics.xlxs
show medal[medal count], medal[country]
Where Olympics[year] = 2012
from Olympics.xlxs
show medal[medal count], Olympics[host country]
where olympics[year] = 2012
from Sales.xlxs
show customer[country]
where order[order year] = 2012
0.86
0.72
0.46
24. 1.
Influencing the best answer
Search for interpretations of
questions
Score and select best interpretation
Select best visual
Data modeling,
synonyms
Workbooks in site
Data modeling (data types, data
categories)
#3: Q & A is probably one of the most interesting and fascinating features that comes with Power BI. Q and A is about making the data as easily accessible as possible to as many users as possible.
Q & A is all about data exploration and its based on the underlying Data Model just like Power View and Power Map, two of the authoring tools. So the better the data model is, the more sophisticated questions you can ask and the more relevant answers you will get.
1. How do I find the right data that answer my questions at this specific moment of time
There are lots of different data sources, so how do you know which ones to look at.
2. Finding the right answer among a myriad of different prebuilt reports. Sometimes they maybe didnt slice the right way, or maybe you wanted the answer right now, but building the report that gave you the answer would take a while.
So Q & A was built to face these problems, and its all data exploration enabling anyone in the organization to discover and find answers to problems , and more importantly finding the answer by asking the questions using a Natural Language, so you dont have to learn a new expression language or new keywords. You just type as you would to any search engine.
#8: - Power BI , self service alternative for business users
- Quick reports, supplement to the standard reports
Based on well known tools like Excel for the authoring Office 365 for the tooling
Briefly mention the different tools, point out where Q & A fits in.
#10: If you are missing relationships between tables in your model, neither Power View nor Q&A are going to start wildly guessing how to join between those tables if you ask a questions about them. Power Pivot relationships are the cornerstone of a good model. For example, you cannot ask for the total sales for Seattle customers if the relationship between the orders table and the customers table is missing.
#11: While poorly named tables and columns make even manual Power View report authoring difficult, choice of names is even more critical for Q&A. For example, if you have a table named CustomerSummary, which contains a list of your customers, you would need to ask questions like List the customer summaries in Chicago rather than List the customers in Chicago. Not exactly intuitive.
While Q&A can do some basic word breaking and detection of plurals, it is going to trust that your table and column names accurately reflect the content.
Consider also this case: You have a table named Headcount that contains first and last names and employee numbers, and you have another table named Employees contains employee numbers, job numbers and start dates. While this might be decipherable to someone familiar with the model, someone who asks count the employees is going to get a count of the rows from the Employees table, which is probably not what they had in mind, since thats a count of every job each employee has ever had. Better to rename the tables to truly reflect what they actually contain.
#12: The Default Field Set is used for selecting the columns to display by default when asking questions about a table. When you ask for orders for last week, you probably dont want to see all thirty columns in your Orders table.
#13: The Default Label is used in Power View for selecting items in filters. It is also used by Q&A when deciding what kind of graph or chart to use for displaying your results. For example, count orders by customer will show a tabular result if the Default Label is not set for the customers table, since there are too many columns to display by default. Setting it to the Customer Name column will allow the result to be shown in a bar chart. Note that some tables, such as transaction tables, will only have a Default Field Set, and no Default Label.
#14: Imported data can have incorrect data types. In particular, date and number columns that are imported as strings will not be interpreted by Q&A as dates and numbers. You should select the correct data type in Power Pivot.
#15: Power View aggressively aggregates numeric columns by default, so questions like total sales by year can sometimes result in a grand total of sales alongside a grand total of years. If you have specific columns where you do not want this behavior, set the Summarize By property on the column to Do Not Summarize. Be on the lookout for year, month, day, and ID columns, as these are the most frequent problems. Other columns that arent sensible to sum, such as age, could also benefit from setting Summarize By to Do Not Summarize or to Average.
#16: The Data Category provides additional semantic knowledge about the content of a column beyond its mere data type. For example, an integer column might be marked as a Year or as a Zip Code. This information is used by Q&A in two important ways: For visualization selection and for language biases.
First, Q&A uses the Data Category information to help make choices about what kind of visual display to use. For example, it recognizes that columns with date or time Data Categories are typically a good choice for the horizontal axis of a line chart or the play axis of a bubble chart. And it assumes that results containing columns with geographical Data Categories may look good on a map.
Second, Q&A makes some educated guesses about how users are likely to talk about date and geography columns, to help it understand certain types of questions. For example, the when in When was John Smith hired? is almost certain to map to a date column, and the Brown in Count customers in Brown is more likely to be a city than a hair color.
Note that some data categories, such as Year, will need to be entered as Custom Categories.
#17: The Sort By Column property allows sorting on one column to automatically sort by a different column instead. For example, when you ask sort customers by shirt size, you probably want your Shirt Size column to sort by the underlying size number (XS, S, M, L, XL) rather than alphabetically (L, M, S, XL, XS).
#18: The general rule of thumb youll want to follow is this: Each unique thing the user talks about should be represented by exactly one model object (table or column). So, if your users talk about customers, there should be one customer object. And if your users talk about sales, there should be a one sales object. Simple, right? Depending on the shape of the data youre starting with, it can be. There are rich data shaping capabilities available in Power Query if you need them, while many of the more straightforward transformations can happen simply using calculations in Power Pivot.
Here are some of the more common transformations you might need to perform:
Create new tables for multi-column entities
If you have multiple columns that act as a single distinct unit within a larger table, those columns should be split out into their own table. For example, if you have a Contact Name, Contact Title, and Contact Phone column within your Companies table, a better design would be to have a separate Contacts table to contain the Name, Title, and Phone and a link back to the Companies table. That makes it significantly easier ask questions about contacts independently of questions about companies for which they are the contact and improves display flexibility (e.g. Default Field Set and Default Label).
#19: This is the only step which is applicable specifically to Q&A rather than Power View in general. Users often have a variety of terms they use to refer to the same thing, e.g. total sales, net sales, total net sales. Power Pivot allows these synonyms to be added to tables and columns within the model. Do not underestimate the importance of this step. As obvious as you think your table and column names might be, the users of Q&A are asking questions using the vocabulary that first pops into their heads, not picking from a predefined list of columns. The more sensible synonyms you can add, the better.
A word of warning, however: Be aware that youre introducing ambiguity when you add the same synonym to more than one column or table. Q&A will utilize context, where possible, to choose between ambiguous synonyms, but not all questions have sufficient context. For example, when your user asks count the customers, if you have three things with the synonym customer in your model, they might not get the answer they are looking for. In these cases, make sure the primary synonym is unique, as that is what is used in the restatement. It can alert the user to the ambiguity (e.g. a restatement of show the number of archived customer records), hinting they might want to ask it differently.
Excel auto-generates synonyms for you
Excel auto-generates one synonym for each table, column, and field name. This is referred to as the Primary Synonym and appears bolded.
If the name is recognized by Excel, that name is used as the synonym. For example, the Primary Synonym for Cumulative Medal Count is cumulative medal count, and Description is description.
If the name is a concatenation of words recognized by Excel, the name will be split into those individual words. For example, the Primary Synonym for HostID is host id, and Country/Region is country region.
If the name is not recognized by Excel, it is used as-is. For example, the Primary Synonym for NOC is noc, and ccrx123 is ccrx123.
Understanding the Synonyms pane
Which data model objects can have synonyms?
Not every object from the data model will appear in the Synonyms pane. Calculated fields can have synonyms, but hierarchies cannot. If you find this confusing, you can turn off the display of objects for which synonyms cannot be created. To do this, in the Display area at the top of the Diagram, deselect one or more objects.
In the Synonyms pane, you will notice that some are bolded and some are not. The bolded synonyms are the ones auto-generated by Power Pivot and are called Primary Synonyms. The lighter synonyms are those manually created. The auto-generated synonym can be deleted and replaced.
#21: Looking at all the workbooks you added for Q & A, and find all that can answer the question
Score: Finding the best interpretation of the ones found
Choosing the right visual, like table or chart, or map, so whrn you have the right answer how do you choose that particular visual
#22: Hypothetical example, you may have 3 different workbooks in your Q & A that may answer the question
Show me the medal country and the country where the medals came from
Ambiguity, Country also appears in question where the Olympics were held
First step is finding all interpretations, and then Q & A puts scores on the different questions in order to rank them
2 important ones: Coverage of the words,
Semantic distance ( small parts is better or compact interpretations than traversing all your model)
#23: First one, every word in the question was used, semantic distance was close
Had to use an other table
Sales scored poorly because all words were not used
#24: Visualization, ran through set of rules, to determine how to best visualize it
Show 1 measure, 1 member (geo) -> map
Show 2 measures, 1 member -> scatter
Show 1 measure, 1 member -> bar
If it was 1 measure and 1 member was geographic, it will show a map
Other poteentials
Go through different interpretations until
You can influence this
How the data is categorized in Power Pivot will influence it