Análise de associação de dados de compra - Uma aplicação em Market Basket Análises

Para ilustrar essa aplicão vamos utilizar o banco de dados Groceries, que está dentro do pacote arules do R.

Qual análise de cesta de compras utilizar? Os dados de compra coletados de operação de ecommerce em uma loja.

Aula 1

BIBLIOTECAS UTILIZADAS

#install.packages("arules")
library(arules)
## Warning: package 'arules' was built under R version 3.5.1
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write

Passo 1: Carregando a Base de Dados

# CARREGANDO A BASE DE DADOS
data(Groceries)

Aula 2 - Análise exploratória

Passo 2: Explorando e preparando os dados

#Descritiva
summary(Groceries)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55 
##   16   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   46   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##        labels  level2           level1
## 1 frankfurter sausage meat and sausage
## 2     sausage sausage meat and sausage
## 3  liver loaf sausage meat and sausage
# Vendo as primeiras 5 transa??es
 inspect(Groceries[1:5])
##     items                     
## [1] {citrus fruit,            
##      semi-finished bread,     
##      margarine,               
##      ready soups}             
## [2] {tropical fruit,          
##      yogurt,                  
##      coffee}                  
## [3] {whole milk}              
## [4] {pip fruit,               
##      yogurt,                  
##      cream cheese ,           
##      meat spreads}            
## [5] {other vegetables,        
##      whole milk,              
##      condensed milk,          
##      long life bakery product}
# Visualizando os 20 itens mais frequentes graficamente (valores absolutos e frequencia)
itemFrequencyPlot(Groceries,topN=20,type="absolute")

itemFrequencyPlot(Groceries,topN=20,type="relative")

Aula 3 - Função apriori

Passo 3: Criando regras de associção

Agora estamos prontos para testar algumas regras! Você sempre terá que passar pelo mínimo necessário de suporte e confiança.

Em uma primeira tentativa testamos:

Suporte m?nimo em 0,001 
Confian?a m?nima em de 0,8
Em seguida, mostramos as 5 principais regras
# Criando a regra 1 utilizando a função apriori
Regras1 <- apriori(Groceries, parameter = list(supp = 0.001, conf = 0.8))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [410 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
# Mostrando as 5 primeiras linhas e fixando a visualização do output

options(digits=2)
inspect(Regras1[1:5])
##     lhs                        rhs            support confidence lift
## [1] {liquor,red/blush wine} => {bottled beer} 0.0019  0.90       11.2
## [2] {curd,cereals}          => {whole milk}   0.0010  0.91        3.6
## [3] {yogurt,cereals}        => {whole milk}   0.0017  0.81        3.2
## [4] {butter,jam}            => {whole milk}   0.0010  0.83        3.3
## [5] {soups,bottled beer}    => {whole milk}   0.0011  0.92        3.6
##     count
## [1] 19   
## [2] 10   
## [3] 17   
## [4] 10   
## [5] 11

Obtemos informações resumidas sobre as regras que nos dão algumas informaões interessantes, tais como:

O número de regras geradas: 410 A distribuição de regras por tamanho: a maioria das regras tem 4 itens O resumo das medidas de qualidade: interessante ver intervalos de apoio, sustentação e confiança.

A informação sobre os dados extraídos: dados totais extraídos e parâmetros mínimos. Por exemplo: se alguém compra iogurte e cereais, é provável que 81% dos clientes comprem leite integral também.

Pode-se finir melhores níveis de suporte e confiança para descubrirmos mais regras:

Resumindo a regra criada

summary(Regras1)
## set of 410 rules
## 
## rule length distribution (lhs + rhs):sizes
##   3   4   5   6 
##  29 229 140  12 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     3.0     4.0     4.0     4.3     5.0     6.0 
## 
## summary of quality measures:
##     support          confidence        lift          count     
##  Min.   :0.00102   Min.   :0.80   Min.   : 3.1   Min.   :10.0  
##  1st Qu.:0.00102   1st Qu.:0.83   1st Qu.: 3.3   1st Qu.:10.0  
##  Median :0.00122   Median :0.85   Median : 3.6   Median :12.0  
##  Mean   :0.00125   Mean   :0.87   Mean   : 4.0   Mean   :12.3  
##  3rd Qu.:0.00132   3rd Qu.:0.91   3rd Qu.: 4.3   3rd Qu.:13.0  
##  Max.   :0.00315   Max.   :1.00   Max.   :11.2   Max.   :31.0  
## 
## mining info:
##       data ntransactions support confidence
##  Groceries          9835   0.001        0.8

Aula 4 - Criando e avaliando regras de associção

Criando a uma outra regra

#Criando a regra 2 e definindo tamanhos de itens de interesse
regra2 <- apriori(Groceries, parameter = list(supp=0.002, conf=0.80, minlen = 4, maxlen=6))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.002      4
##  maxlen target   ext
##       6  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 19 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [147 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [8 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

summarizar a regra2

summary(regra2)
## set of 8 rules
## 
## rule length distribution (lhs + rhs):sizes
## 4 5 
## 3 5 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     4.0     4.0     5.0     4.6     5.0     5.0 
## 
## summary of quality measures:
##     support          confidence        lift         count     
##  Min.   :0.00203   Min.   :0.80   Min.   :3.2   Min.   :20.0  
##  1st Qu.:0.00203   1st Qu.:0.81   1st Qu.:3.2   1st Qu.:20.0  
##  Median :0.00224   Median :0.82   Median :3.3   Median :22.0  
##  Mean   :0.00236   Mean   :0.83   Mean   :3.6   Mean   :23.2  
##  3rd Qu.:0.00247   3rd Qu.:0.84   3rd Qu.:4.1   3rd Qu.:24.2  
##  Max.   :0.00315   Max.   :0.89   Max.   :4.6   Max.   :31.0  
## 
## mining info:
##       data ntransactions support confidence
##  Groceries          9835   0.002        0.8

inspecionar a regra2

inspect(regra2)
##     lhs                        rhs                support confidence lift count
## [1] {tropical fruit,                                                           
##      grapes,                                                                   
##      whole milk}            => {other vegetables}  0.0020       0.80  4.1    20
## [2] {other vegetables,                                                         
##      curd,                                                                     
##      domestic eggs}         => {whole milk}        0.0028       0.82  3.2    28
## [3] {pork,                                                                     
##      other vegetables,                                                         
##      butter}                => {whole milk}        0.0022       0.85  3.3    22
## [4] {root vegetables,                                                          
##      other vegetables,                                                         
##      yogurt,                                                                   
##      fruit/vegetable juice} => {whole milk}        0.0020       0.83  3.3    20
## [5] {root vegetables,                                                          
##      whole milk,                                                               
##      yogurt,                                                                   
##      fruit/vegetable juice} => {other vegetables}  0.0020       0.80  4.1    20
## [6] {citrus fruit,                                                             
##      tropical fruit,                                                           
##      root vegetables,                                                          
##      whole milk}            => {other vegetables}  0.0032       0.89  4.6    31
## [7] {citrus fruit,                                                             
##      root vegetables,                                                          
##      other vegetables,                                                         
##      yogurt}                => {whole milk}        0.0023       0.82  3.2    23
## [8] {tropical fruit,                                                           
##      root vegetables,                                                          
##      yogurt,                                                                   
##      rolls/buns}            => {whole milk}        0.0022       0.81  3.2    22

Ordenar pelos maiores lifts

inspect(sort(regra2, by = "lift"))
##     lhs                        rhs                support confidence lift count
## [1] {citrus fruit,                                                             
##      tropical fruit,                                                           
##      root vegetables,                                                          
##      whole milk}            => {other vegetables}  0.0032       0.89  4.6    31
## [2] {tropical fruit,                                                           
##      grapes,                                                                   
##      whole milk}            => {other vegetables}  0.0020       0.80  4.1    20
## [3] {root vegetables,                                                          
##      whole milk,                                                               
##      yogurt,                                                                   
##      fruit/vegetable juice} => {other vegetables}  0.0020       0.80  4.1    20
## [4] {pork,                                                                     
##      other vegetables,                                                         
##      butter}                => {whole milk}        0.0022       0.85  3.3    22
## [5] {root vegetables,                                                          
##      other vegetables,                                                         
##      yogurt,                                                                   
##      fruit/vegetable juice} => {whole milk}        0.0020       0.83  3.3    20
## [6] {other vegetables,                                                         
##      curd,                                                                     
##      domestic eggs}         => {whole milk}        0.0028       0.82  3.2    28
## [7] {citrus fruit,                                                             
##      root vegetables,                                                          
##      other vegetables,                                                         
##      yogurt}                => {whole milk}        0.0023       0.82  3.2    23
## [8] {tropical fruit,                                                           
##      root vegetables,                                                          
##      yogurt,                                                                   
##      rolls/buns}            => {whole milk}        0.0022       0.81  3.2    22

Encontrando subconjuntos de interresse

Criando a Regra 3

regra3 <- apriori( Groceries, parameter = list(supp = 0.002, conf = 0.7, minlen = 2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.7    0.1    1 none FALSE            TRUE       5   0.002      2
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 19 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [147 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [94 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
summary(regra3)
## set of 94 rules
## 
## rule length distribution (lhs + rhs):sizes
##  3  4  5 
## 22 59 13 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     3.0     4.0     4.0     3.9     4.0     5.0 
## 
## summary of quality measures:
##     support         confidence        lift         count   
##  Min.   :0.0020   Min.   :0.70   Min.   :2.7   Min.   :20  
##  1st Qu.:0.0021   1st Qu.:0.71   1st Qu.:2.8   1st Qu.:21  
##  Median :0.0024   Median :0.74   Median :3.0   Median :24  
##  Mean   :0.0026   Mean   :0.75   Mean   :3.2   Mean   :26  
##  3rd Qu.:0.0027   3rd Qu.:0.77   3rd Qu.:3.5   3rd Qu.:27  
##  Max.   :0.0057   Max.   :0.89   Max.   :4.6   Max.   :56  
## 
## mining info:
##       data ntransactions support confidence
##  Groceries          9835   0.002        0.7
inspect(sort(regra3[1:20], decreasing = TRUE, by = "lift"))
##      lhs                     rhs                support confidence lift count
## [1]  {whipped/sour cream,                                                    
##       soft cheese}        => {other vegetables}  0.0022       0.73  3.8    22
## [2]  {root vegetables,                                                       
##       soft cheese}        => {other vegetables}  0.0024       0.73  3.8    24
## [3]  {citrus fruit,                                                          
##       herbs}              => {other vegetables}  0.0021       0.72  3.7    21
## [4]  {root vegetables,                                                       
##       baking powder}      => {other vegetables}  0.0025       0.71  3.7    25
## [5]  {root vegetables,                                                       
##       rice}               => {other vegetables}  0.0022       0.71  3.7    22
## [6]  {tropical fruit,                                                        
##       herbs}              => {whole milk}        0.0023       0.82  3.2    23
## [7]  {hamburger meat,                                                        
##       curd}               => {whole milk}        0.0025       0.81  3.2    25
## [8]  {herbs,                                                                 
##       rolls/buns}         => {whole milk}        0.0024       0.80  3.1    24
## [9]  {root vegetables,                                                       
##       rice}               => {whole milk}        0.0024       0.77  3.0    24
## [10] {butter milk,                                                           
##       whipped/sour cream} => {whole milk}        0.0029       0.76  3.0    29
## [11] {onions,                                                                
##       butter}             => {whole milk}        0.0031       0.75  2.9    30
## [12] {butter,                                                                
##       soft cheese}        => {whole milk}        0.0020       0.74  2.9    20
## [13] {cream cheese ,                                                         
##       sugar}              => {whole milk}        0.0020       0.74  2.9    20
## [14] {butter,                                                                
##       curd}               => {whole milk}        0.0049       0.72  2.8    48
## [15] {yogurt,                                                                
##       specialty cheese}   => {whole milk}        0.0020       0.71  2.8    20
## [16] {dessert,                                                               
##       butter milk}        => {whole milk}        0.0020       0.71  2.8    20
## [17] {domestic eggs,                                                         
##       sugar}              => {whole milk}        0.0036       0.71  2.8    35
## [18] {yogurt,                                                                
##       baking powder}      => {whole milk}        0.0033       0.71  2.8    32
## [19] {whipped/sour cream,                                                    
##       sliced cheese}      => {whole milk}        0.0027       0.71  2.8    27
## [20] {butter,                                                                
##       coffee}             => {whole milk}        0.0034       0.70  2.7    33

Encontrando subconjuntos com o item beef

regra_beef <- subset(regra3, items %in% "beef")

inspect(regra_beef)
##     lhs                   rhs                support confidence lift count
## [1] {beef,                                                                
##      other vegetables,                                                    
##      domestic eggs}    => {whole milk}        0.0025       0.76  3.0    25
## [2] {beef,                                                                
##      tropical fruit,                                                      
##      root vegetables}  => {other vegetables}  0.0027       0.73  3.8    27
## [3] {beef,                                                                
##      tropical fruit,                                                      
##      rolls/buns}       => {whole milk}        0.0021       0.78  3.0    21

Encontrando regras segmentadas

Segmnetacao 1 - O que os clientes compram antes de comprar um determinado produto (beef) ?

regra3_seg1 <- apriori( Groceries, parameter = list(sup = 0.002, conf = 0.2),
                                   appearance = list(default = "lhs", rhs="beef"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5   0.002      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 19 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [147 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [16 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(regra3_seg1)
##      lhs                   rhs    support confidence lift count
## [1]  {pork,                                                    
##       root vegetables}  => {beef}  0.0027       0.20  3.8    27
## [2]  {root vegetables,                                         
##       butter}           => {beef}  0.0029       0.23  4.4    29
## [3]  {root vegetables,                                         
##       newspapers}       => {beef}  0.0027       0.24  4.6    27
## [4]  {citrus fruit,                                            
##       root vegetables}  => {beef}  0.0039       0.22  4.2    38
## [5]  {root vegetables,                                         
##       soda}             => {beef}  0.0040       0.21  4.1    39
## [6]  {root vegetables,                                         
##       rolls/buns}       => {beef}  0.0050       0.21  3.9    49
## [7]  {pork,                                                    
##       other vegetables,                                        
##       whole milk}       => {beef}  0.0023       0.23  4.4    23
## [8]  {root vegetables,                                         
##       whole milk,                                              
##       butter}           => {beef}  0.0020       0.25  4.7    20
## [9]  {other vegetables,                                        
##       whole milk,                                              
##       domestic eggs}    => {beef}  0.0025       0.21  3.9    25
## [10] {citrus fruit,                                            
##       root vegetables,                                         
##       other vegetables} => {beef}  0.0021       0.21  3.9    21
## [11] {citrus fruit,                                            
##       root vegetables,                                         
##       whole milk}       => {beef}  0.0022       0.24  4.7    22
## [12] {tropical fruit,                                          
##       root vegetables,                                         
##       other vegetables} => {beef}  0.0027       0.22  4.3    27
## [13] {tropical fruit,                                          
##       root vegetables,                                         
##       whole milk}       => {beef}  0.0025       0.21  4.0    25
## [14] {root vegetables,                                         
##       other vegetables,                                        
##       soda}             => {beef}  0.0020       0.25  4.7    20
## [15] {root vegetables,                                         
##       other vegetables,                                        
##       rolls/buns}       => {beef}  0.0028       0.23  4.4    28
## [16] {root vegetables,                                         
##       whole milk,                                              
##       rolls/buns}       => {beef}  0.0028       0.22  4.3    28

Segmentacao 2 - O que os clientes compram depois de comprar um determinado produto (beef) ?

regra3_seg2 <- apriori( Groceries, parameter = list(sup = 0.002, conf = 0.2),
                                   appearance = list(default = "rhs", lhs="beef"))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.2    0.1    1 none FALSE            TRUE       5   0.002      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 19 
## 
## set item appearances ...[1 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [147 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 done [0.00s].
## writing ... [6 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
inspect(regra3_seg2)
##     lhs       rhs                support confidence lift count
## [1] {}     => {whole milk}       0.256   0.26       1.0  2513 
## [2] {beef} => {root vegetables}  0.017   0.33       3.0   171 
## [3] {beef} => {yogurt}           0.012   0.22       1.6   115 
## [4] {beef} => {rolls/buns}       0.014   0.26       1.4   134 
## [5] {beef} => {other vegetables} 0.020   0.38       1.9   194 
## [6] {beef} => {whole milk}       0.021   0.41       1.6   209

Aula 5 – Visualisação gráfica das regras criadas

Instalanto

#install.packages("arulesViz")
library(arulesViz)
## Warning: package 'arulesViz' was built under R version 3.5.2
## Loading required package: grid
#Visualização gráfica das regras criadas, tamém é possivel utilizar o método interativo

plot(regra3_seg2, method = "graph")

plot(regra3_seg2, method = "graph", interactive = TRUE)
## Warning in plot.rules(regra3_seg2, method = "graph", interactive = TRUE):
## The parameter interactive is deprecated. Use engine='interactive' instead.