Incidence relations are a ubiquitous form of data, and this presentation begins an exploration of their analysis. Using a simple collaborative filtering scheme over data represented as a bipartite graph of users and foods they like, the presentation is meant as an approachable introduction to the basic ideas of Formal Concept analysis.
1 of 37
Downloaded 11 times
More Related Content
Mathematics of Incidence, part 1: Getting Started with Collaborative Filtering
1. Mathematics ?
of ?
Incidence
part 1: Getting started through Collaborative Filtering
!
Benjamin J. Keller?
bjkeller.github.io?
!
v.1, 26 September 2014
Creative Commons Attribution-ShareAlike 4.0 International License
a
u1
u2
?
s1
s2
uk
sl
?
t
3. Likes represented as bipartite graph
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
bipartite graph (U,V,E) has vertices in disjoint sets U and V with edges (u,v) in E
from vertex u in U to vertex v in V
9. Biclique – a special subgraph
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
A biclique (U,V,E) of a bipartite graph G is a subgraph of G
such that each u in U has an edge (u,v) with each v in V
10. Constructing a biclique 1.1
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Start with vertices for Brian and Charles
11. Constructing a biclique 1.2
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Find all foods liked by Brian and Charles
12. Constructing a biclique 2.1
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Start with vertices for all foods liked by Brian and Charles
13. Constructing a biclique 2.2
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Find all users who like same foods as Brian and Charles
26. Biclique order
Transitive: if a ≤ b and b ≤ c then a ≤ c
Antisymmetric: if a ≤ b and b ≤ a then a = b
Re?exive: a ≤ a, for all bicliques a
Inherits properties from subset order on sets:
called a partial order
30. We saw that
Abby Charles
David
cherries
doughnuts
eggs
apples
bananas
Brian
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
involves related
bicliques
but, what about others?
34. It has to be – a "proof"
a
u1
u2
?
s1
s2
uk
sl
?
t
a
u1
u2
?
s1
s2
uk
sl
?
u1
u2
?
s1
s2
uk
sl
?
t
Can recommend food t to user a because there are foods
s1,s2,…,sl liked by user a and users u1,u2,…,uk who also
like food t
Means that there must be a biclique A
involving at least users a and u1,u2,…,uk
and exactly foods s1,s2,…,sl
And, there is also a biclique B involving exactly
users u1,u2,…,uk and at least foods s1,s2,…,sl and t
As user set of A is a superset of the users of B, it must be that A > B
35. Questions to ponder
? What is a "good"
recommendation?
? Serendipity, who?
? And, what's the deal with
Abby and doughnuts,
anyway?
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
Abby
Brian
Charles
David
apples
bananas
cherries
doughnuts
eggs
every
biclique
above has
Abby
every
biclique
below has
doughnuts
36. About me and these slides
I am Ben(jamin) Keller. I learn and, sometimes, create through
explaining. I had been involved in a big (US) federally funded
project that had the goal of helping biomedical scientists tell stories
about their experimental observations. The project is long gone, but
I’m still trying to grok how such a thing would work. Much of
biological data comes in the form of observations that are distilled
to something that looks like an incidence relation, which brings us
to this series of presentations.
My goal for the slides is to deal with the mathematics of incidence
in an approachable way, but the intuitive beginnings will eventually
allow us to embrace the more complex later.
37. This work is licensed under a ?
Creative Commons Attribution-ShareAlike 4.0
International License.