import pandas as pd
import numpy as np
import seaborn as sns

Aqui eu estou apenas definindo algumas questões estáticas na apresentação dos gráficos

sns.set_style('whitegrid')
sns.set_palette('flare_r')
sns.color_palette('flare_r')

Afim de me aprofundar um pouco mais em data science, regressões e machine learning, baixei o conjunto de dados disponibilizado nesse link (por Nik Davis):
https://www.kaggle.com/nikdavis/steam-store-games

Dentro do arquivo de download, haviam 6 arquivos .csv. Dessa forma, vou primeiro dar uma olhadinha em cada 1 deles para ter mais noção do tipo e variação de dados que tenho a disposição.

df1 = pd.read_csv('Dados Brutos/steam.csv')

df1.head()

df1.shape

(27075, 18)

df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27075 entries, 0 to 27074
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   appid             27075 non-null  int64  
 1   name              27075 non-null  object 
 2   release_date      27075 non-null  object 
 3   english           27075 non-null  int64  
 4   developer         27075 non-null  object 
 5   publisher         27075 non-null  object 
 6   platforms         27075 non-null  object 
 7   required_age      27075 non-null  int64  
 8   categories        27075 non-null  object 
 9   genres            27075 non-null  object 
 10  steamspy_tags     27075 non-null  object 
 11  achievements      27075 non-null  int64  
 12  positive_ratings  27075 non-null  int64  
 13  negative_ratings  27075 non-null  int64  
 14  average_playtime  27075 non-null  int64  
 15  median_playtime   27075 non-null  int64  
 16  owners            27075 non-null  object 
 17  price             27075 non-null  float64
dtypes: float64(1), int64(8), object(9)
memory usage: 3.7+ MB

df1.describe()

df2 = pd.read_csv('Dados Brutos/steam_description_data.csv')

df2.head(10)

df2.shape

(27334, 4)

df2.describe()

df3 = pd.read_csv('Dados Brutos/steam_media_data.csv')

df3.head()

df4 = pd.read_csv('Dados Brutos/steam_requirements_data.csv')

df4.head()

df5 = pd.read_csv('Dados Brutos/steam_support_info.csv')

df5.head()

df6 = pd.read_csv('Dados Brutos/steamspy_tag_data.csv')

df6.head()

Deu pra dar uma boa olhada nas bases. Provavelmente a mais útil pro que tenho em mente será a primeira mesmo, porém, acho que existe bastante espaço pra fazer algum tipo de nuvem de palavras com a segunda, que contém descrições dos jogos, e talvez dê pra ter uma noção de qual a configuração ideal de computador pra rodar jogos da steam usando essas informações da quarta base.

Analisando o primeiro dataframe¶

É possível notar que, na primeira base, existe um campo de data. Ele está com a tipagem de 'object', porém, acredito que faça mais sentido avalia-lo como data.

df = df1.copy()

df.head()

Aparentemente, a primeira coluna possui os dados de identificação, então, provavelmente só sera utilizada para eventuais joins.
Já a segunda coluna apresenta os nomes dos jogos. Muito provavelmente só usaremos para identificação também, dessa forma...
Podemos começar a avaliar as colunas a partir da terceira, que possui data.

df['release_date'] = pd.to_datetime(df['release_date'])

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27075 entries, 0 to 27074
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   appid             27075 non-null  int64         
 1   name              27075 non-null  object        
 2   release_date      27075 non-null  datetime64[ns]
 3   english           27075 non-null  int64         
 4   developer         27075 non-null  object        
 5   publisher         27075 non-null  object        
 6   platforms         27075 non-null  object        
 7   required_age      27075 non-null  int64         
 8   categories        27075 non-null  object        
 9   genres            27075 non-null  object        
 10  steamspy_tags     27075 non-null  object        
 11  achievements      27075 non-null  int64         
 12  positive_ratings  27075 non-null  int64         
 13  negative_ratings  27075 non-null  int64         
 14  average_playtime  27075 non-null  int64         
 15  median_playtime   27075 non-null  int64         
 16  owners            27075 non-null  object        
 17  price             27075 non-null  float64       
dtypes: datetime64[ns](1), float64(1), int64(8), object(8)
memory usage: 3.7+ MB

Agora que o campo 'release_date' está com a tipagem correta, vamos dar uma olhada nas datas e tentar descobrir algumas informações acerca da idade dos jogos.

df['release_date'].max()

Timestamp('2019-05-01 00:00:00')

df['release_date'].min()

Timestamp('1997-06-30 00:00:00')

df[df['release_date'] == df['release_date'].max()]

df[df['release_date'] == df['release_date'].min()]

Aqui é interessante ter uma ideia da distribuição dos jogos ao longo dos anos. Podemos supor, em um primeiro momento, que, uma vez que a Steam foi lançada por volta de 2003, e, começou a incluir jogos de plataformas como Linux e MacOS depois de 2009, provavelmente exista uma maior distribuição de jogos conforme os anos avançam.


PT-BR	EN

df['Year'] = df['release_date'].apply(lambda x: x.year)

ax = sns.histplot(data=df, x='Year', stat='probability', discrete=True)
ax.figure.set_size_inches(15,6)

Esse dataset foi extraido durante o mês de maio de 2019, por isso, é notável essa queda ao fim do gráfico (podemos imaginar que a coluna referente ao ano de 2019 seria maior se os dados tivessem sido extraidos em dezembro). Podemos dar uma olhada na quarta coluna

df['english'].unique()

array([1, 0], dtype=int64)

Aparentemente, só possui 2 valores, indicando se o jogo tem suporte a lingua inglesa ou não. Vale a pena dar uma olhada nos jogos que não possuem esse suporte.

df[df['english'] == 0]

df[df['english'] == 0].count()

appid               511
name                511
release_date        511
english             511
developer           511
publisher           511
platforms           511
required_age        511
categories          511
genres              511
steamspy_tags       511
achievements        511
positive_ratings    511
negative_ratings    511
average_playtime    511
median_playtime     511
owners              511
price               511
Year                511
dtype: int64

Boa parte deles parece ter vindo de países asiáticos. Interessante. Existe a possibilidade de, devido a esses jogos serem produzidos em países asiáticos, o valor dos mesmos seja menor, uma vez que, na média, o valor das moedas asiaticas é menor que da moeda americana ou européia. Além disso, existe também a possibilidade de a ausência de suporte a lingua inglesa indicar limitações no capital investido na confecção do jogo.
Podemos dar uma olhada melhor nesse seleto grupo em outro momento, por enquanto, vamos prosseguir.

df['developer'].unique()

array(['Valve', 'Gearbox Software', 'Valve;Hidden Path Entertainment',
       ..., 'SHEN JIAWEI', 'Semyon Maximov', 'Adept Studios GD'],
      dtype=object)

df['developer'].mode()

0    Choice of Games
dtype: object

Essa moda não reflete, de fato, a moda real do conjunto de desenvolvedores, já que vários deles estão concatenados na mesma coluna.

desenv_split = df['developer'].apply(lambda x: pd.Series(x.split(';')))

desenv_split

desenv = []
for index, row in desenv_split.iterrows():
    for i in desenv_split.columns:
        desenv.append(row[i])

(Essa foi a solução que eu achei para listar todas as desenvolvedoras. Não foi das mais elegantes, então, se você tiver alguma ideia pra substituir essa monstruosidade, eu estou aberto a sugestões)

desenv = pd.Series(desenv)

desenv.dropna(inplace=True)

desenv.mode()

0    Choice of Games
dtype: object

Agora temos a verdadeira moda (que, ironicamente, era a mesma de antes).

Infelizmente, esses dados são muito variados, então, acho improvável transforma-los em variáveis binárias ou ordinais. Dessa forma, existe um maior proveito em simplesmente tentar identificar algumas desenvolvedoras mais proeminentes.

desenv.value_counts().head(10)

Choice of Games               94
KOEI TECMO GAMES CO., LTD.    75
Ripknot Systems               62
Laush Dmitriy Sergeevich      51
Nikita "Ghost_RUS"            51
Dexion Games                  45
Feral Interactive (Mac)       44
RewindApp                     44
Hosted Games                  42
Blender Games                 40
dtype: int64

desenv.value_counts().tail(10)

Giant Box Games        1
Nuclear Tales          1
Frost Dog Studio       1
Chicken in The Corn    1
Lee Fuller             1
MegawattsCo            1
Rockstar Leeds         1
Redro Games            1
Andy Campbell          1
Toothless T-Rex        1
dtype: int64

Provavelmente teremos o mesmo problema com a publisher.

df['publisher'].unique()

array(['Valve', 'Mark Healey', 'Tripwire Interactive', ..., 'MonteCube',
       'Velvet Paradise Games', 'SHEN JIAWEI'], dtype=object)

df['publisher'].mode()

0    Big Fish Games
dtype: object

publi_split = df['publisher'].apply(lambda x: pd.Series(x.split(';')))

publi_split

publi = []
for index, row in publi_split.iterrows():
    for i in publi_split.columns:
        publi.append(row[i])

publi = pd.Series(publi)

publi.dropna(inplace=True)

publi.mode()

0    Big Fish Games
dtype: object

publi.value_counts().head(10)

Big Fish Games       212
Strategy First       136
THQ Nordic           119
Ubisoft              114
Plug In Digital      111
Square Enix          107
Nightdive Studios    101
1C Entertainment      96
Sekai Project         96
Choice of Games       94
dtype: int64

publi.value_counts().tail(10)

Anatola Howard        1
Team Frizz            1
Pixel Wonder          1
Wispfire              1
Arif Games            1
Sergey Sevostyanov    1
Bullet Bears          1
FYRG                  1
Frogmouth             1
Mindwedge             1
dtype: int64

Algumas dessas informações são bem interessantes. Particularmente, eu não conhecia algumas dessas empresas.

Agora, seguiremos para avaliar as plataformas

df['platforms'].unique()

array(['windows;mac;linux', 'windows;mac', 'windows', 'windows;linux',
       'mac', 'mac;linux', 'linux'], dtype=object)

Meu palpite é de que, jogos que estão disponíveis em mais plataformas provavelmente venderiam mais, o que pode acarretar no número maior de owners.

Pensando em como tratar esse campo, talvez seja interessante criar 3 novas colunas de valor binário, pra determinar se um dado jogo está disponível para essa plataforma ou não.

df['platforms_windows'] = df['platforms'].str.contains('windows', regex=False).astype(int)
df['platforms_mac'] = df['platforms'].str.contains('mac', regex=False).astype(int)
df['platforms_linux'] = df['platforms'].str.contains('linux', regex=False).astype(int)

df.tail(10)

Essa transformação nos permite, também, ter uma noção de quanto jogos estão disponíveis em cada plataforma, simplesmente efetuando uma soma.

df['platforms_windows'].sum()

27070

df['platforms_mac'].sum()

8066

df['platforms_linux'].sum()

5235

Aparentemente, existem mais jogos disponíveis para windows, e, por último, para linux. Acho que é interessante calcular a correlação entre essas 3 variáveis, pra evitar problemas de multicolinearidade futuros.

df[['platforms_windows','platforms_mac','platforms_linux']].corr()

Existe uma correlação razoável entre mac e linux. Minha hipótese é a de que, uma vez que windows possui mais entradas, provavelmente quando uma desenvolvedora decide produzir para alguma plataforma, ela opta pelo windows, e, quando quer desenvolver para mais de uma, acaba por optar por desenvolver para linux E mac, e não linux OU mac. Podemos dar uma olhada novamente na coluna platforms, pra ter uma ideia melhor.

df[df['platforms'] == 'windows;mac;linux'].shape[0]

4623

df[df['platforms'] == 'mac;linux'].shape[0]

1

df[df['platforms'] == 'windows;mac'].shape[0]

3439

df[df['platforms'] == 'windows;linux'].shape[0]

610

Como eu pensei. Apesar de poucos jogos serem lançados somente para mac e linux (1), boa parte deles é lançado juntamente com windows. Isso significa que, uma vez que o jogo saia para linux, é esperado também que saia para mac, e o inverso é, em menor grau, também verdade.

Agora, vamos ver a variável 'required_age'

df['required_age'].unique()

array([ 0, 18, 16, 12,  7,  3], dtype=int64)

O campo 'required_age' está em formato numérico, o que pode facilitar a nossa eventual regressão.

ax = sns.histplot(data=df, x='required_age', discrete=True, stat='probability')
ax.figure.set_size_inches(15,6)
ax.set(xticks=range(0,20,2))

[[<matplotlib.axis.XTick at 0x2c7e274f9a0>,
  <matplotlib.axis.XTick at 0x2c7e274fd30>,
  <matplotlib.axis.XTick at 0x2c7e276a070>,
  <matplotlib.axis.XTick at 0x2c7e277a4f0>,
  <matplotlib.axis.XTick at 0x2c7e277aa90>,
  <matplotlib.axis.XTick at 0x2c7e277f100>,
  <matplotlib.axis.XTick at 0x2c7e277f6a0>,
  <matplotlib.axis.XTick at 0x2c7e277fe80>,
  <matplotlib.axis.XTick at 0x2c7e2784130>,
  <matplotlib.axis.XTick at 0x2c7e277f2e0>]]

Parace que a grande maioria dos jogos na nossa amostra não requer idade mínima. Pode ser interessante, mais pra frente, dar uma olhada na quantidade de pessoas que detem jogos nas últimas duas faixas de idade, por curiosidade.

Agora, vamos prosseguir para a próxima variável.

df['categories'].unique()

array(['Multi-player;Online Multi-Player;Local Multi-Player;Valve Anti-Cheat enabled',
       'Multi-player;Valve Anti-Cheat enabled',
       'Single-player;Multi-player;Valve Anti-Cheat enabled', ...,
       'Online Multi-Player;Steam Achievements;Full controller support;In-App Purchases;Steam Cloud',
       'Multi-player;Local Multi-Player;Co-op;Local Co-op;Shared/Split Screen',
       'Multi-player;Online Multi-Player;Cross-Platform Multiplayer;Stats'],
      dtype=object)

len(df['categories'].unique().tolist())

3333

Parece que existe uma grande quantidade de variações de categorias. Vou tentar utilizar a mesma técnica que utilizamos para separar developers e publishers, e, dependendo de como ficar, talvez seja viável transforma-las em variáveis binárias, como no caso das platforms.

categ_split = df['categories'].apply(lambda x: pd.Series(x.split(';')))

categ_split

Parece que um mesmo jogo pode ter até 18 categorias diferentes. Apesar de serem muitas, eu acredito que isso possa influenciar tanto a quantidade de pessoas que detem o jogo, como o seu preço. Acho que vou ter que criar colunas binárias pra eles então...

categ = []
for index, row in categ_split.iterrows():
    for i in categ_split.columns:
        categ.append(row[i])

categ = pd.Series(categ)

categ.dropna(inplace=True)

categ.unique()

array(['Multi-player', 'Online Multi-Player', 'Local Multi-Player',
       'Valve Anti-Cheat enabled', 'Single-player', 'Steam Cloud',
       'Steam Achievements', 'Steam Trading Cards', 'Captions available',
       'Partial Controller Support', 'Includes Source SDK',
       'Cross-Platform Multiplayer', 'Stats', 'Commentary available',
       'Includes level editor', 'Steam Workshop', 'In-App Purchases',
       'Co-op', 'Full controller support', 'Steam Leaderboards',
       'SteamVR Collectibles', 'Online Co-op', 'Shared/Split Screen',
       'Local Co-op', 'MMO', 'VR Support', 'Mods', 'Mods (require HL2)',
       'Steam Turn Notifications'], dtype=object)

len(categ.unique())

29

Parece que me precipitei. Essencialmente, existem 29 categorias diferentes, só nesta amostra. Talvez seja sim possível que um jogo tenha mais de 18 categorias então. POrém, é perceptível que algumas dessas categorias parecem um pouco redundantes. Talvez seja interessante calcular correlação entre elas. Também pode ser interessante calcular a quantidade de categorias que cada jogo tem, já que o simples número de categorias pode ser um indicador também.

categ.mode()

0    Single-player
dtype: object

categ.value_counts().head(10)

Single-player                 25678
Steam Achievements            14130
Steam Trading Cards            7918
Steam Cloud                    7219
Full controller support        5695
Partial Controller Support     4234
Multi-player                   3974
Steam Leaderboards             3439
Online Multi-Player            2487
Shared/Split Screen            2152
dtype: int64

categ.value_counts().tail(10)

In-App Purchases            690
MMO                         421
VR Support                  231
Commentary available        144
Valve Anti-Cheat enabled     94
Steam Turn Notifications     63
SteamVR Collectibles         40
Includes Source SDK          35
Mods                          2
Mods (require HL2)            1
dtype: int64

categ.unique().tolist()

['Multi-player',
 'Online Multi-Player',
 'Local Multi-Player',
 'Valve Anti-Cheat enabled',
 'Single-player',
 'Steam Cloud',
 'Steam Achievements',
 'Steam Trading Cards',
 'Captions available',
 'Partial Controller Support',
 'Includes Source SDK',
 'Cross-Platform Multiplayer',
 'Stats',
 'Commentary available',
 'Includes level editor',
 'Steam Workshop',
 'In-App Purchases',
 'Co-op',
 'Full controller support',
 'Steam Leaderboards',
 'SteamVR Collectibles',
 'Online Co-op',
 'Shared/Split Screen',
 'Local Co-op',
 'MMO',
 'VR Support',
 'Mods',
 'Mods (require HL2)',
 'Steam Turn Notifications']

df_corr = pd.DataFrame()
df_corr

for i in categ.unique().tolist():
    nome = 'categories_'+i
    df[nome] = df['categories'].str.contains(i, regex=False).astype(int)
    df_corr[nome] = df['categories'].str.contains(i, regex=False).astype(int)

df_corr.columns

Index(['categories_Multi-player', 'categories_Online Multi-Player',
       'categories_Local Multi-Player', 'categories_Valve Anti-Cheat enabled',
       'categories_Single-player', 'categories_Steam Cloud',
       'categories_Steam Achievements', 'categories_Steam Trading Cards',
       'categories_Captions available',
       'categories_Partial Controller Support',
       'categories_Includes Source SDK',
       'categories_Cross-Platform Multiplayer', 'categories_Stats',
       'categories_Commentary available', 'categories_Includes level editor',
       'categories_Steam Workshop', 'categories_In-App Purchases',
       'categories_Co-op', 'categories_Full controller support',
       'categories_Steam Leaderboards', 'categories_SteamVR Collectibles',
       'categories_Online Co-op', 'categories_Shared/Split Screen',
       'categories_Local Co-op', 'categories_MMO', 'categories_VR Support',
       'categories_Mods', 'categories_Mods (require HL2)',
       'categories_Steam Turn Notifications'],
      dtype='object')

df['categories_QTD'] = 1+ df['categories'].str.contains(';', regex=False).astype(int)
df_corr['categories_QTD'] = 1+ df['categories'].str.contains(';', regex=False).astype(int)

df_corr.head(10)

df_corr.corr()

Convenhamos, essa tabela de correlação ficou muito grande, meio inviável de se avaliar. Por isso, vou construir um heatmap abaixo pra ter uma noção um pouco melhor.

ax = sns.heatmap(df_corr.corr(), linewidths=.2, xticklabels=True, yticklabels=True)
ax.figure.set_size_inches(15,15)

Felizmente, a maioria das correlações não foi tão alta, porém, houveram algumas aqui e ali que incomodaram um pouco. Em especial, as colunas:

Variável A	Variável B
'categories_Mods'	'categories_Mods (require HL2)'
'categories_Shared/Split Screen'	'categories_Local Co-op'
'categories_Online Co-op'	'categories_Online Multi-Player'
'categories_Co-op'	'categories_Local Co-op'
'categories_Co-op'	'categories_Online Co-op'

Houveram mais algumas correlações, mas, acredito que com essas que elenquei, já podemos ter uma ideia do que eventualmente jogar fora e o que manter. Já adicionamos as colunas binárias ao dataframe, então, podemos prosseguir para a próxima coluna.

df.columns

Index(['appid', 'name', 'release_date', 'english', 'developer', 'publisher',
       'platforms', 'required_age', 'categories', 'genres', 'steamspy_tags',
       'achievements', 'positive_ratings', 'negative_ratings',
       'average_playtime', 'median_playtime', 'owners', 'price', 'Year',
       'platforms_windows', 'platforms_mac', 'platforms_linux',
       'categories_Multi-player', 'categories_Online Multi-Player',
       'categories_Local Multi-Player', 'categories_Valve Anti-Cheat enabled',
       'categories_Single-player', 'categories_Steam Cloud',
       'categories_Steam Achievements', 'categories_Steam Trading Cards',
       'categories_Captions available',
       'categories_Partial Controller Support',
       'categories_Includes Source SDK',
       'categories_Cross-Platform Multiplayer', 'categories_Stats',
       'categories_Commentary available', 'categories_Includes level editor',
       'categories_Steam Workshop', 'categories_In-App Purchases',
       'categories_Co-op', 'categories_Full controller support',
       'categories_Steam Leaderboards', 'categories_SteamVR Collectibles',
       'categories_Online Co-op', 'categories_Shared/Split Screen',
       'categories_Local Co-op', 'categories_MMO', 'categories_VR Support',
       'categories_Mods', 'categories_Mods (require HL2)',
       'categories_Steam Turn Notifications', 'categories_QTD'],
      dtype='object')

df['genres'].unique()

array(['Action', 'Action;Free to Play', 'Action;Free to Play;Strategy',
       ...,
       'Action;Adventure;Indie;Massively Multiplayer;RPG;Strategy;Early Access',
       'Action;Adventure;Casual;Free to Play;Indie;RPG;Simulation;Sports;Strategy',
       'Casual;Free to Play;Massively Multiplayer;RPG;Early Access'],
      dtype=object)

len(df['genres'].unique())

1552

Parece que teremos que fazer a mesma coisa que fizemos nas últimas variáveis qualitativas.

gen_split = df['genres'].apply(lambda x: pd.Series(x.split(';')))

gen_split

gen = []
for index, row in gen_split.iterrows():
    for i in gen_split.columns:
        gen.append(row[i])

gen = pd.Series(gen)

gen.dropna(inplace=True)

len(gen.unique())

29

gen.unique()

array(['Action', 'Free to Play', 'Strategy', 'Adventure', 'Indie', 'RPG',
       'Animation & Modeling', 'Video Production', 'Casual', 'Simulation',
       'Racing', 'Violent', 'Massively Multiplayer', 'Nudity', 'Sports',
       'Early Access', 'Gore', 'Utilities', 'Design & Illustration',
       'Web Publishing', 'Education', 'Software Training',
       'Sexual Content', 'Audio Production', 'Game Development',
       'Photo Editing', 'Accounting', 'Documentary', 'Tutorial'],
      dtype=object)

gen.mode()

0    Indie
dtype: object

gen.value_counts().head(10)

Indie           19421
Action          11903
Casual          10210
Adventure       10032
Strategy         5247
Simulation       5194
RPG              4311
Early Access     2954
Free to Play     1704
Sports           1322
dtype: int64

gen.value_counts().tail(10)

Education            51
Video Production     38
Software Training    31
Audio Production     29
Web Publishing       28
Game Development     17
Photo Editing        12
Accounting            6
Documentary           1
Tutorial              1
dtype: int64

Nik Davis, o cara que extraiu essa base, tinha comentado que poderiam haver alguns jogos que não eram realmente jogos, e sim software disponível na steam. Me parece que esses gêneros encontrados com a função tail() podem estar relacionados a esses softwares.

gen.value_counts().tail(50)

Indie                    19421
Action                   11903
Casual                   10210
Adventure                10032
Strategy                  5247
Simulation                5194
RPG                       4311
Early Access              2954
Free to Play              1704
Sports                    1322
Racing                    1024
Violent                    843
Massively Multiplayer      723
Gore                       537
Nudity                     266
Sexual Content             245
Utilities                  146
Design & Illustration       87
Animation & Modeling        79
Education                   51
Video Production            38
Software Training           31
Audio Production            29
Web Publishing              28
Game Development            17
Photo Editing               12
Accounting                   6
Documentary                  1
Tutorial                     1
dtype: int64

gen.value_counts().tail(13).index.tolist()

['Utilities',
 'Design & Illustration',
 'Animation & Modeling',
 'Education',
 'Video Production',
 'Software Training',
 'Audio Production',
 'Web Publishing',
 'Game Development',
 'Photo Editing',
 'Accounting',
 'Documentary',
 'Tutorial']

Esses parecem ser os gêneros a serem verificados. Vou filtrar o dataframe para dar uma olhada nos titulos desses "jogos".

nomes = []
for i in gen.value_counts().tail(13).index.tolist():
    
    nomes.extend(df[df['genres'].str.contains(i)]['name'].tolist())

set(nomes)

{'001 Game Creator',
 '10k',
 '3D ParticleGen Visual FX',
 '3DCoat 4.8',
 '3DCoat Modding Tool',
 '3DF Zephyr Lite Steam Edition',
 '3DMark 11',
 '8 Queens',
 'ASMR Universe',
 'ASMR Universe 2',
 'Aartform Curvy 3D 3.0',
 'Amaze: Interactive VR Movies',
 'AnimVR',
 'AppGameKit: Easy Game Development',
 'Arcade Artist',
 'Arcane Mapper',
 'Atlantia Animation',
 'AutoTileGen',
 "Axis Game Factory's AGFPRO v3",
 'BASIC8',
 'BeanVR—The Social VR APP',
 'Belle II in Virtual Reality',
 'Blocks by Google',
 'Button Music',
 'Buzz Aldrin: Cycling Pathways to Mars',
 'CINEVR - The Movie Theater',
 'CPUCores :: Maximize Your FPS',
 "Cairo's Tale: The Big Egg",
 'Campfire Pro',
 'Catzilla 4K - Advanced',
 'Chernobyl VR Project',
 'ClearMem :: Free Up Your RAM',
 'Clickteam Fusion 2.5',
 'Club Lighting',
 'ControlMyJoystick',
 'Controller Companion',
 'CopperCube 5 Game Engine',
 'CyberLink PowerDVD 18 Ultra - Media player, video player, 4k media player, 360 video',
 'DJ Mix Pads',
 'DJ Streamer',
 'DashPanel',
 'DatavizVR Demo',
 'Dating Lessons',
 'DeoVR Video Player',
 'Desktop Audio Visualizer',
 'Desktop Mascot Engine',
 'Dimension Two',
 'Dimmdrive :: Gaming Ramdrive @ 10,000+ MB/s',
 'Discord Bot Maker',
 'Disney Movies VR',
 'DlodloVRPlayer',
 'DreamDesk VR',
 'Driver Fusion - The Best Driver & Device Solution',
 'Drum Simulator',
 'Dungeon Painter Studio',
 'EF EVE™ - Volumetric Video Platform (VR & Desktop)',
 'EF-12: Fighting Game Maker',
 'ENGAGE',
 'EVGA Precision XOC',
 'EXA: The Infinite Instrument',
 'Easy™ eSports',
 'Edmersiv',
 'Electronic Piano',
 'Emoto',
 'FaceRig',
 'Fantasynth: Chez Nous',
 'FireAlpaca SE',
 'FlatOut 4: Total Insanity Workshop Tool',
 'Float',
 'Félix VR',
 'GIPHY Museum of GIF Art',
 'GIPHY World VR',
 'GTGD S1: More Than a Gamer',
 'GTGD S3 How To Make A Game',
 'Game Character Hub',
 'Game Character Hub: Portfolio Edition',
 'GameGuru',
 'GameMaker Studio 2 Desktop',
 'GameMaker Studio 2 Mobile',
 'GameMaker Studio 2 UWP',
 'GameMaker Studio 2 Web',
 'Garage Drummer VR',
 'Geekbench 3',
 'GeoVox',
 'Ghost Mountain Roller Coaster',
 'GizmoVR Video Player',
 'GlobalMap Astro',
 'Godot Engine',
 'GraviSound',
 'Grove - VR Browsing Experience',
 'Handy Dice',
 'HomestarVR',
 'Houdini Indie',
 'Impromptu Vector Field Painter',
 'J.A.C.K.',
 'Janus VR',
 'Jaunt VR - Experience Cinematic Virtual Reality',
 "Jitsumi's Game Booster",
 'Juventus VR',
 'KOPICS',
 'Keyboard Engine',
 'KeysIn1',
 'Krita',
 'KumaKuma Manga Editor',
 'Kz NTools : Fix Your Network',
 'LIV',
 'Leadwerks Game Engine',
 'Leadwerks Game Launcher',
 'Liquid Rhythm',
 'Live Wallpaper Master',
 'Live2DViewerEX',
 'MINT VR',
 'MODO indie',
 'Magic Borderless',
 'Manga Maker Comipo',
 'MermaidVR Video Player',
 'Micro Cosmic Worlds',
 'Military Operations: Benchmark',
 'Mind Massaging Machine',
 'Mindshow',
 'MorphVOX Pro - Voice Changer',
 'Multiscreens',
 'My Desktop Alive',
 'MyRPG Master',
 'NOLO HOME',
 'Natural Locomotion',
 'Neos VR',
 'NexVR Video Player',
 'NextVR - Live Sports and Entertainment in Virtual Reality',
 'Nimble Writer',
 'Oats Studios - Volume 1 Assets',
 'PD Howler 9.6 Digital Painter and Visual FX box',
 'Platfinity™',
 'Pluto',
 'Printer Forge 3D',
 'Quantum Engine',
 'RAYBEEM - Live in Your Music',
 'RPG Maker MV',
 'RPG Maker VX Ace',
 'RPG Maker XP',
 'Raffle Blast',
 'RainDesktop',
 'RainWallpaper',
 'RainWidget',
 'Red Bull Doodle Art - Global VR Gallery',
 'Remembering Pearl Harbor',
 'Renegade Tabletop Tools System',
 'Room Designer VR',
 'Rumpus',
 'RutonyChat',
 'Rytmik Lite Chiptune Synthesizer',
 'Rytmik Player',
 'Rytmik Studio',
 'Rytmik Ultimate',
 'S2ENGINE HD',
 'SAO Utils: Beta',
 'SKYBOX VR Video Player',
 'SMILE GAME BUILDER',
 'SPATIAL SOUND CARD',
 'SRPG Studio',
 'STYLY',
 'SYMMETRY alpha',
 'Sankhara',
 'ScriptVOX Studio',
 'ShapeSim - Construction Set',
 "Shoot 'Em Up Kit",
 'Sim 4K VR MediaPlayer',
 'Sim Racing Telemetry',
 'Simmetri',
 'Source Filmmaker',
 'Spriter Pro',
 'Star Swarm Stress Test',
 'Strata inStudio VR',
 'Stream Avatars',
 'Stream Buddy',
 'Supermedium - Virtual Reality Browser',
 'TCPingInfoView',
 'TcNo TimeKeeper',
 'The Adliberum Engine (ADLENGINE)',
 'The Kremer Collection VR Museum',
 'The Music Room',
 'Tilt Brush',
 'TurnSignal',
 'Tvori',
 'Ultimate Desktop Character Engine',
 'Unbound',
 'VIRTUAnimator',
 'VR Fractals',
 'VR Meditation SkyRun',
 'VR Toolbox: 360 Desktop',
 'VRCapture',
 'VRTGO',
 'Vinyl Reality',
 'VirZOOM Gamepad Emulator',
 'Virtual Battlemap',
 'Virtual Desktop',
 'Virtual Idea Area',
 'Virtual telescope',
 'Visual Novel Engine',
 'VoiceAttack',
 'VoxVR',
 'Vrideo',
 'WITHIN',
 'Wallpaper Builder',
 'Wallpaper Engine',
 'Weelco VR',
 'Whirligig VR Media Player',
 'Wing IDE 5',
 'Witly - language tutoring in VR',
 'XDeskClock',
 'XLaunchpad',
 'XSprite',
 'XWidget',
 'articy:draft 3',
 'articy:draft 3 - Flex License',
 'connect - Virtual Home (3D or VR)',
 'ePic Character Generator',
 'iVRy Driver for SteamVR',
 'liteCam Game 5:  120 FPS Game Capture + Streamer',
 'liteCam Game: 100 FPS Game Capture',
 'liteCam HD 5.0: Screen/Stream Capture',
 'mdiapp+ SE',
 'openCanvas 6',
 'openCanvas 7',
 'rumii',
 'sheepChat',
 'vBuilder',
 '英语杀'}

len(set(nomes))

228

Podemos ver que o número não é tão expressivo. E, de fato, varios dos "jogos" são, claramente, softwares de criação de jogos e afins. Vamos ver quais são as categorias ligadas a esses jogos.

df.columns

Index(['appid', 'name', 'release_date', 'english', 'developer', 'publisher',
       'platforms', 'required_age', 'categories', 'genres', 'steamspy_tags',
       'achievements', 'positive_ratings', 'negative_ratings',
       'average_playtime', 'median_playtime', 'owners', 'price', 'Year',
       'platforms_windows', 'platforms_mac', 'platforms_linux',
       'categories_Multi-player', 'categories_Online Multi-Player',
       'categories_Local Multi-Player', 'categories_Valve Anti-Cheat enabled',
       'categories_Single-player', 'categories_Steam Cloud',
       'categories_Steam Achievements', 'categories_Steam Trading Cards',
       'categories_Captions available',
       'categories_Partial Controller Support',
       'categories_Includes Source SDK',
       'categories_Cross-Platform Multiplayer', 'categories_Stats',
       'categories_Commentary available', 'categories_Includes level editor',
       'categories_Steam Workshop', 'categories_In-App Purchases',
       'categories_Co-op', 'categories_Full controller support',
       'categories_Steam Leaderboards', 'categories_SteamVR Collectibles',
       'categories_Online Co-op', 'categories_Shared/Split Screen',
       'categories_Local Co-op', 'categories_MMO', 'categories_VR Support',
       'categories_Mods', 'categories_Mods (require HL2)',
       'categories_Steam Turn Notifications', 'categories_QTD'],
      dtype='object')

df[df['name'].isin(nomes)]['categories'].unique()

array(['Steam Workshop', 'Steam Cloud', 'Steam Leaderboards',
       'Steam Trading Cards;Steam Workshop',
       'Steam Achievements;Steam Cloud;Stats', 'Single-player',
       'Single-player;Multi-player;MMO;Co-op;Shared/Split Screen;Cross-Platform Multiplayer;Steam Achievements;Steam Leaderboards',
       'Single-player;Steam Achievements;Steam Workshop;In-App Purchases;Partial Controller Support;Includes level editor',
       'Includes level editor',
       'Single-player;Multi-player;Partial Controller Support;Includes level editor',
       'Single-player;Steam Achievements;Steam Leaderboards',
       'Single-player;Multi-player;Includes level editor',
       'Single-player;Steam Achievements', 'Single-player;Steam Cloud',
       'Steam Achievements;Steam Trading Cards;Steam Workshop',
       'Single-player;Steam Trading Cards',
       'Multi-player;Cross-Platform Multiplayer',
       'VR Support;SteamVR Collectibles', 'Steam Trading Cards',
       'Includes Source SDK', 'Steam Achievements',
       'Steam Achievements;Partial Controller Support;Stats;Includes level editor',
       'Steam Achievements;Steam Workshop;Steam Cloud;Stats',
       'Full controller support;Steam Workshop;Steam Leaderboards',
       'Steam Trading Cards;Partial Controller Support',
       'Steam Achievements;Steam Trading Cards;Partial Controller Support;Includes level editor',
       'Full controller support;Steam Workshop',
       'Full controller support;Includes level editor',
       'Partial Controller Support;Includes level editor',
       'Steam Achievements;Steam Trading Cards',
       'VR Support;Steam Workshop', 'Steam Trading Cards;Steam Cloud',
       'VR Support', 'Captions available',
       'Steam Achievements;Steam Cloud',
       'Steam Achievements;Steam Trading Cards;Steam Workshop;Includes level editor',
       'Full controller support;VR Support',
       'VR Support;Partial Controller Support', 'Full controller support',
       'Single-player;Partial Controller Support',
       'Partial Controller Support', 'Single-player;Steam Workshop',
       'Steam Achievements;Steam Trading Cards;Steam Cloud;Stats;Includes level editor',
       'Steam Achievements;Includes level editor', 'Steam Cloud;Stats',
       'Steam Workshop;Partial Controller Support',
       'Single-player;Multi-player;Online Multi-Player',
       'Steam Workshop;Steam Cloud',
       'Partial Controller Support;Steam Cloud',
       'Single-player;Local Multi-Player;Local Co-op',
       'Steam Achievements;Steam Workshop;Steam Cloud',
       'Steam Achievements;Full controller support',
       'Single-player;Local Multi-Player;Local Co-op;Shared/Split Screen;Steam Achievements;Full controller support;Steam Trading Cards;Steam Cloud',
       'Steam Workshop;Partial Controller Support;Steam Cloud;Includes level editor',
       'Stats',
       'Steam Achievements;Steam Trading Cards;Steam Workshop;Steam Cloud;Stats',
       'In-App Purchases',
       'Single-player;Steam Trading Cards;Steam Cloud',
       'Steam Achievements;Steam Trading Cards;Steam Workshop;Steam Cloud',
       'Single-player;Multi-player;Online Multi-Player;Cross-Platform Multiplayer;Partial Controller Support',
       'Single-player;Multi-player;Steam Workshop',
       'Single-player;Multi-player;Online Multi-Player;Co-op;Online Co-op;Cross-Platform Multiplayer',
       'Single-player;Multi-player;Online Multi-Player;MMO;Co-op;Online Co-op;Includes level editor',
       'Captions available;Steam Cloud;Includes level editor',
       'Steam Workshop;Includes level editor',
       'Full controller support;In-App Purchases',
       'Steam Cloud;Includes level editor',
       'In-App Purchases;Includes level editor',
       'Single-player;Multi-player;Online Multi-Player;Cross-Platform Multiplayer;Includes level editor',
       'Steam Workshop;Partial Controller Support;Includes level editor',
       'Single-player;Full controller support;In-App Purchases',
       'Steam Workshop;Steam Cloud;Stats',
       'Single-player;Full controller support',
       'Online Multi-Player;Online Co-op;Cross-Platform Multiplayer',
       'Single-player;Captions available',
       'Single-player;Includes level editor',
       'Steam Workshop;Includes Source SDK',
       'Steam Achievements;Steam Workshop', 'In-App Purchases;Stats',
       'Single-player;Steam Cloud;Includes level editor',
       'Steam Workshop;Steam Cloud;Includes level editor'], dtype=object)

df[df['name'].isin(nomes)]['categories'].mode()

0    Single-player
dtype: object

df[df['name'].isin(nomes)]['categories'].value_counts().head(10)

Single-player                 34
Partial Controller Support    19
Steam Workshop                16
Steam Achievements            12
Steam Cloud                   11
Includes level editor         11
Full controller support        8
In-App Purchases               8
Steam Trading Cards            6
VR Support                     6
Name: categories, dtype: int64

Considerando tanto as categorias como os gêneros, é possível afirmar com um pouco mais de propriedade que essas entradas são, de fato, em grande maioria, ferramentas de desenvolvimento de jogos. Como o volume também é baixo, vou exclui-las do dataframe.

df = df[~df['name'].isin(nomes)]

df.head()

Agora que limpamos a base de prováveis entradas de software, vale a pena reiniciar o processo de separação dos gêneros. Imagino que, como sobraram menos gêneros, talvez seja possível transforma-los em binário também.

df['genres'].unique()

array(['Action', 'Action;Free to Play', 'Action;Free to Play;Strategy',
       ...,
       'Action;Adventure;Indie;Massively Multiplayer;RPG;Strategy;Early Access',
       'Action;Adventure;Casual;Free to Play;Indie;RPG;Simulation;Sports;Strategy',
       'Casual;Free to Play;Massively Multiplayer;RPG;Early Access'],
      dtype=object)

gen_split = df['genres'].apply(lambda x: pd.Series(x.split(';')))

gen_split

gen = []
for index, row in gen_split.iterrows():
    for i in gen_split.columns:
        gen.append(row[i])

gen = pd.Series(gen)

gen.dropna(inplace=True)

gen.mode()

0    Indie
dtype: object

gen.unique()

array(['Action', 'Free to Play', 'Strategy', 'Adventure', 'Indie', 'RPG',
       'Casual', 'Simulation', 'Racing', 'Violent',
       'Massively Multiplayer', 'Nudity', 'Sports', 'Early Access',
       'Gore', 'Sexual Content'], dtype=object)

len(gen.unique())

16

gen.value_counts().head(10)

Indie           19399
Action          11901
Casual          10191
Adventure       10028
Strategy         5245
Simulation       5177
RPG              4309
Early Access     2908
Free to Play     1696
Sports           1322
dtype: int64

gen.value_counts().tail(10)

RPG                      4309
Early Access             2908
Free to Play             1696
Sports                   1322
Racing                   1024
Violent                   842
Massively Multiplayer     721
Gore                      536
Nudity                    266
Sexual Content            245
dtype: int64

df_corr = pd.DataFrame()
df_corr

for i in gen.unique().tolist():
    nome = 'genres_'+i
    df[nome] = df['genres'].str.contains(i, regex=False).astype(int)
    df_corr[nome] = df['genres'].str.contains(i, regex=False).astype(int)

df_corr.head(10)

df_corr.corr()

ax = sns.heatmap(df_corr.corr(), linewidths=.2, xticklabels=True, yticklabels=True)
ax.figure.set_size_inches(15,15)

Nesse caso, houveram apenas 2 "grandes" correlações.

Variável A	Variável B
genres_Nudity	genres_Sexual Content
genres_Violent	genres_Gore

Provavelmente teremos que escolher 1 variável de cada linha para utilizar na eventual regressão. Vamos para a próxima variável

df['achievements'].unique()

array([   0,   33,  147,   54,   13,   15,   22,  520,   73,   70,   51,
         66,  167,   44,  285,   12,   69,   21,   14,   37,   29,   67,
         61,   50,   80,  286,   60,   11,  709,   20,   38,   30,   40,
         10,   17,   25,   52,   71,   57,    7,   36,    9,   16,    4,
        106,  247,   58,    8,   87,   45,   27,   18,   92,   19,   43,
         75, 1165,  169,   56,   23,  260,   28,   59,   24,   26,   47,
        140,   97,  145,   49,  322,   64,   35,   62,   81,   41,   68,
         88,    1,   63,   48,   83,  129,  110,  114,   90,   77,   65,
         32,   42,  100,   46,  115,  124,  220,  251,  104,   99,  650,
         34,   31,   53,   85,  119,   98,  161,   74,  412,   91,   93,
        113,  849,  111,    2,  150,  213,   82,  107,  188,  146,   95,
       1130,  178,  103,  294,    6,  123,   89,   39,  210,  109,  141,
        187,  452,  257,  134,  108,  295,  300,  120,    3,  183,  105,
        490,   72,  199,  237,  403,  121,  159,   86, 1448,  175,   76,
       1746,  125,  131,   55,  152,    5,  127,  130,  236,   78,  256,
         79,  139,   84,  241,  323,  419,  165,  102,   96,  170,  126,
        319,  163,  118,  250,  361,  172,  116,   94,  282, 1559,  185,
        136,  112,  160,  198,  128,  176,  265,  179,  143,  354,  180,
        226,  208, 1080,  572,  135,  132, 1002,  408,  240, 1095,  157,
       4034,  202,  164,  608,  133, 5000,  101,  394,  334, 1317, 2880,
       1224,  142,  217,  162,  480,  224,  193, 1090,  206,  601,  229,
       2448,  137,  302, 9821, 1708, 2995,  484,  590,  464,  173,  303,
        151,  727,  201,  555,  400,  117,  500, 1009,  720, 4094, 4765,
       1043,  365,  315,  153,  505,  888,  184,  222, 5394,  678, 2130,
        225,  166, 2008,  252,  604,  270,  526, 2516,  168,  191,  243,
        350,  324, 1510,  272, 2007,  750, 1500,  674, 4896,  192,  219,
       3039,  362, 1215, 1176, 4564, 1596, 2090,  200,  961,  156,  602,
        348,  316,  309, 1011,  658, 1018,  332,  177, 4732, 1512,  998,
       2020,  186,  281, 2126, 1487, 4251,  360, 2134,  158, 2659, 4979,
       2001,  317, 1602,  777, 3000, 1315, 1819,  543, 2042, 2638,  154,
       4497, 1070, 1505,  212, 4999, 4013,  482,  299,  345,  204, 2306,
       1998, 1364,  428,  333, 2976, 3600,  296, 4884,  258, 1800, 2230,
        330, 2021,  291,  298,  235, 1266, 1113, 4996, 2220, 3006, 4293,
       1000,  284,  451, 1327,  800,  215, 1010,  283,  999,  232,  754,
       4981, 3340,  574,  510, 4989, 4757,  995,  279, 4956,  144,  337,
       4987,  190, 1970, 1172, 4977, 4585,  138, 1100,  565, 1730,  196,
        600, 4805, 3346, 1337, 3452,  357,  952, 1203, 1581, 4997,  630,
       2043,  308,  233], dtype=int64)

Pelo que pode ser visto, esse campo já está em formato numérico, o que facilita muita coisa. Podemos dar uma olhada no seu histograma, apenas pra ter uma noção melhor da distribuição dos dados.

ax = sns.histplot(data=df['achievements'])
ax.figure.set_size_inches(15,6)

Parece que o histograma não foi muito útil como forma de visualização desses dados. Vamos tentar o boxplot.

ax = sns.boxplot(data=df['achievements'])
ax.figure.set_size_inches(15,6)

df['achievements'].mean()

45.613536467257696

df['achievements'].std()

354.1483340105123

df['achievements'].var()

125421.04248242141

df['achievements'].max()

9821

df['achievements'].min()

0

df['achievements']

0         0
1         0
2         0
3         0
4         0
         ..
27070     7
27071     0
27072    24
27073     0
27074     0
Name: achievements, Length: 26846, dtype: int64

df['achievements'].dropna()

0         0
1         0
2         0
3         0
4         0
         ..
27070     7
27071     0
27072    24
27073     0
27074     0
Name: achievements, Length: 26846, dtype: int64

Bom, parece que esse campo é um pouco estranho mesmo. O desvio padrão é bem alto, o que indica que poucos valores de achievements se repetem entre os jogos, e que eles não estão exatamente próximos um dos outros.
Vamos pra próxima variável.

df['positive_ratings'].unique()

array([124534,   3318,   3416, ...,   1720,   2966,   1490], dtype=int64)

len(df['positive_ratings'].unique())

2791

df['positive_ratings'].max()

2644404

df['positive_ratings'].min()

0

df['positive_ratings'].mean()

1004.7550845563585

df['positive_ratings'].std()

19065.287190135652

df['positive_ratings'].var()

363485175.64235055

Tentei plotar os gráficos de histograma e boxplot, mas, não deu muito certo, muito tempo de processamento, preferi retirar da análise.

Esse campo diz qual a quantidade de análises positivas que um dado jogo tem. Ele tem um par, que é o campo 'negative_ratings'. Podemos verificar se existe alguma correlação forte entre eles.

df['negative_ratings'].unique()

array([3339,  633,  398, ..., 1780, 1332, 1035], dtype=int64)

len(df['negative_ratings'].unique())

1488

df['negative_ratings'].max()

487076

df['negative_ratings'].min()

0

df['negative_ratings'].mean()

212.34172688668704

df['negative_ratings'].std()

4303.099997223416

df['negative_ratings'].var()

18516669.58610416

Vamos a correlação

df[['negative_ratings','positive_ratings']].corr()

Como já era de se esperar, a correlação entre as avaliações realmente é consideravel, porém, é interessante notar que elas tem uma correlação positiva. Isso pode indicar que elas são, na verdade, proporcionais a quantidade de avaliações. Como elas tem uma correlação bem alta, vou criar uma terceira variável, que será a subtração dessas duas, e vou usar ela na eventual regressão.

df['delta_ratings'] = df['positive_ratings'] - df['negative_ratings']

df['delta_ratings'].unique()

array([121195,   2685,   3018, ...,    472,    600,    611], dtype=int64)

len(df['delta_ratings'].unique())

2712

df['delta_ratings'].max()

2242091

df['delta_ratings'].min()

-7738

df['delta_ratings'].mean()

792.4133576696714

df['delta_ratings'].std()

16025.730333753105

df['delta_ratings'].var()

256824032.73017442

Vou calcular também uma nota média baseada no percentual de votos positivos.

df['%_ratings'] = df['positive_ratings']/(df['positive_ratings'] + df['negative_ratings'])

df['%_ratings'].unique()

array([0.97388815, 0.8397874 , 0.89564761, ..., 0.64197531, 0.68209877,
       0.77692308])

len(df['%_ratings'].unique())

8402

df['%_ratings'].max()

1.0

df['%_ratings'].min()

0.0

df['%_ratings'].mean()

0.7144242187237336

Parece que encontramos um outro ponto de atenção. É bem provável que essas entradas com nota percentual 1 e 0 sejam, na verdade, casos com poucas avaliações. Será que o número de avaliações pode ter influência no seu preço? Na quantidade, eu imagino que tenha.

df['QTD_ratings'] = (df['positive_ratings'] + df['negative_ratings'])

Como comentado acima, vamos dar uma olhada nas entradas com '%_ratings' com valor 1 e 0

rating_1 = df[df['%_ratings'] == df['%_ratings'].max()][['name','positive_ratings','negative_ratings','owners','QTD_ratings']].copy()
rating_1

rating_0 = df[df['%_ratings'] == df['%_ratings'].min()][['name','positive_ratings','negative_ratings','owners','QTD_ratings']].copy()
rating_0

df[['delta_ratings','%_ratings','QTD_ratings']].corr()

Como existe uma correlação alta entre QTD_ratings e delta_ratings, vou usar só o QTD_ratings.

rating_1['owners'].unique()

array(['0-20000', '20000-50000'], dtype=object)

rating_0['owners'].unique()

array(['0-20000', '20000-50000'], dtype=object)

df['owners'].unique()

array(['10000000-20000000', '5000000-10000000', '2000000-5000000',
       '20000000-50000000', '100000000-200000000', '50000000-100000000',
       '20000-50000', '500000-1000000', '100000-200000', '50000-100000',
       '1000000-2000000', '200000-500000', '0-20000'], dtype=object)

Como visto acima, a maioria dessas entradas apresenta valores comparativamente baixos no campo "owners", que são as pessoas que detem esses jogos em suas bibliotecas. Podemos analisar e fazer alguns tratamentos no campo 'owners', pra, depois, ter uma ideia de qual o percentual médio de pessoas detentoras do jogo que o avaliam.

len(df['owners'].unique())

13

own_split = df['owners'].apply(lambda x: pd.Series(x.split('-')))

df['owners_mean'] = (own_split[0].astype(int)+own_split[1].astype(int))/2

df['owners_mean']

0        15000000.0
1         7500000.0
2         7500000.0
3         7500000.0
4         7500000.0
            ...    
27070       10000.0
27071       10000.0
27072       10000.0
27073       10000.0
27074       10000.0
Name: owners_mean, Length: 26846, dtype: float64

rating_1 = df[df['%_ratings'] == df['%_ratings'].max()][['name','positive_ratings','negative_ratings','owners','QTD_ratings','owners_mean']]

rating_0 = df[df['%_ratings'] == df['%_ratings'].min()][['name','positive_ratings','negative_ratings','owners','QTD_ratings','owners_mean']]

rating_1['owners_mean'].unique()

array([10000., 35000.])

rating_0['owners_mean'].unique()

array([10000., 35000.])

temp = df[['QTD_ratings','positive_ratings','owners_mean']].groupby('owners_mean').mean().round(2)
temp

temp.reset_index(inplace=True)

temp

temp['QTD_ratings'].astype(float)/temp['owners_mean'].astype(float)

0     0.004073
1     0.006722
2     0.006695
3     0.007103
4     0.006304
5     0.008434
6     0.009491
7     0.008655
8     0.010804
9     0.010032
10    0.010674
11    0.026867
12    0.006704
dtype: float64

(temp['QTD_ratings'].astype(float)/temp['owners_mean'].astype(float)).mean()

0.009427474183150182

temp['positive_ratings'].astype(float)/temp['owners_mean'].astype(float)

0     0.003074
1     0.005151
2     0.005140
3     0.005594
4     0.005011
5     0.006856
6     0.007854
7     0.007445
8     0.009231
9     0.008452
10    0.009857
11    0.020937
12    0.005757
dtype: float64

Podemos perceber que o percentual de owners que avalia os jogos não é exatamente constante entre todas as médias.

len(df['owners_mean'])

26846

df['owners_mean'].hist()

<AxesSubplot:>

len(df[df['owners_mean'] == df['owners_mean'].min()]['owners_mean'])

18433

df[df['owners_mean'] == df['owners_mean'].min()]['owners_mean']

122      10000.0
124      10000.0
145      10000.0
188      10000.0
248      10000.0
          ...   
27070    10000.0
27071    10000.0
27072    10000.0
27073    10000.0
27074    10000.0
Name: owners_mean, Length: 18433, dtype: float64

Parece que a grande maioria dos jogos nesse dataset tem, em média, 10 mil owners. Acho que vale a pena separar os dados em 3 grupos então, 1 somente com jogos com mais de 10 mil owners de média, um com todos os jogos, e outro com somente os jogos com média de 10 mil owners. Pra não atrapalhar o fluxo até agora, vou deixar pra fazer essa separação depois de ter efetuado as análises nas outras variáveis.

df["average_playtime"].unique()

array([17612,   277,   187, ...,  1100,  6332,   816], dtype=int64)

len(df["average_playtime"].unique())

1338

df["average_playtime"].head()

0    17612
1      277
2      187
3      258
4      624
Name: average_playtime, dtype: int64

df["average_playtime"].max()

190625

df["average_playtime"].min()

0

Parece que existem jogos que foram jogados, em média, 0 horas. Acho que é uma boa ideia dar uma olhadinha neles.

df[df["average_playtime"] == df["average_playtime"].min()][['name','average_playtime','median_playtime']]

df[df["average_playtime"] == df["average_playtime"].min()][['name','average_playtime','median_playtime']].shape

(20709, 3)

A grande maioria dos jogos não possui horas significativas de jogatina média ou mediana. Provavelmente não vai ajudar muito considerar o tempo de jogatina de cada jogo. Por via das dúvidas, vou criar uma coluna que apenas computa se o average_playtime é maior que 0.

df["average_playtime_>0"] = df["average_playtime"].apply(lambda x: x>0).astype(int)

ax = sns.histplot(df["average_playtime"], stat='probability')
ax.figure.set_size_inches(15,6)

Bom, por fim, falta avaliarmos a variável 'price'. Lembrando que os preços encontrados aqui estão em libra.

df['price'].unique().round(2)

array([7.1900e+00, 3.9900e+00, 0.0000e+00, 5.7900e+00, 5.9900e+00,
       1.4990e+01, 6.9900e+00, 2.2990e+01, 4.9900e+00, 1.0990e+01,
       2.9900e+00, 4.2900e+00, 1.2990e+01, 2.4900e+00, 9.9900e+00,
       1.9900e+00, 1.5990e+01, 3.4900e+00, 2.4990e+01, 8.5900e+00,
       7.9900e+00, 4.4900e+00, 4.2500e+00, 5.5900e+00, 2.5990e+01,
       8.9900e+00, 4.7900e+00, 1.3990e+01, 8.2900e+00, 9.2900e+00,
       1.9990e+01, 1.7900e+00, 1.6990e+01, 8.6900e+00, 7.4900e+00,
       7.9000e-01, 1.6900e+00, 1.5490e+01, 1.4900e+00, 1.7990e+01,
       1.1990e+01, 3.5960e+01, 6.4900e+00, 1.5900e+00, 3.4990e+01,
       9.9000e-01, 5.1900e+00, 2.0900e+00, 2.9990e+01, 1.9980e+01,
       2.7900e+00, 2.8900e+00, 3.4950e+01, 1.2490e+01, 1.1390e+01,
       4.3900e+00, 1.8990e+01, 1.0590e+01, 6.1900e+00, 3.9990e+01,
       2.6990e+01, 3.0990e+01, 6.0000e+00, 2.3990e+01, 3.3990e+01,
       1.2390e+01, 8.4900e+00, 2.3790e+01, 1.9490e+01, 3.8990e+01,
       2.3390e+01, 6.9500e+00, 1.1490e+01, 1.8740e+01, 2.0990e+01,
       5.4900e+00, 1.3490e+01, 1.7980e+01, 2.7790e+01, 2.7110e+01,
       4.4980e+01, 1.1690e+01, 2.2900e+00, 5.4990e+01, 4.4990e+01,
       2.7000e+01, 4.6500e+00, 1.4490e+01, 1.3500e+01, 1.3590e+01,
       1.6300e+01, 1.1000e+01, 4.9990e+01, 7.5000e+00, 3.2900e+00,
       1.0290e+01, 1.7490e+01, 1.3000e+01, 4.5000e+00, 2.7990e+01,
       8.8900e+00, 4.5900e+00, 6.0990e+01, 6.6900e+00, 3.1990e+01,
       3.2990e+01, 3.6990e+01, 9.0000e+00, 3.1900e+00, 5.6900e+00,
       1.5000e+00, 2.2490e+01, 1.0000e+00, 3.5000e+00, 4.1990e+01,
       2.5790e+01, 1.3370e+01, 7.0000e+00, 3.9950e+01, 6.3500e+00,
       6.1500e+00, 1.0000e+01, 3.2000e-01, 3.2500e+01, 2.5900e+00,
       7.5000e-01, 5.9000e-01, 1.9900e+01, 4.2400e+00, 3.2300e+00,
       1.1240e+01, 4.9000e-01, 5.8000e+00, 2.7100e+01, 2.1000e+01,
       2.2800e+00, 9.0000e-01, 6.2900e+00, 6.3900e+00, 4.1200e+00,
       2.1990e+01, 6.5900e+00, 5.0000e+00, 1.8390e+01, 3.9200e+00,
       1.5000e+01, 1.8100e+01, 7.2000e-01, 1.1950e+01, 4.9500e+00,
       1.5500e+00, 2.0510e+01, 1.1750e+01, 2.6900e+00, 1.2000e+01,
       1.5499e+02, 7.2000e+00, 5.5400e+00, 4.4800e+01, 1.4690e+01,
       3.1000e+00, 4.6900e+00, 2.5000e+00, 1.6500e+00, 9.9000e+00,
       1.8490e+01, 2.3000e+00, 7.5990e+01, 2.3900e+00, 5.9500e+00,
       7.9000e+00, 7.8000e-01, 1.5900e+01, 1.6660e+01, 7.7700e+00,
       1.0490e+01, 2.0000e+00, 4.6490e+01, 7.5900e+00, 3.5900e+00,
       2.8500e+00, 1.4950e+01, 3.7900e+00, 4.7000e+00, 7.7000e+00,
       6.7900e+00, 3.7990e+01, 7.0900e+00, 1.0790e+01, 1.6740e+01,
       4.0000e+00, 2.3600e+00, 2.2400e+00, 3.5300e+00, 8.9000e-01,
       2.4000e+01, 8.0000e+00, 8.0000e-01, 2.8000e+00, 1.9500e+00,
       2.4900e+01, 7.1200e+00, 2.0700e+00, 1.2580e+01, 3.0900e+00,
       1.4300e+00, 1.4210e+01, 3.0000e+00, 5.0500e+00, 3.0400e+00,
       7.4000e-01, 1.1890e+01, 2.1300e+00, 7.7400e+00, 2.4500e+01,
       3.8900e+00, 1.1000e+00, 6.2990e+01, 7.8990e+01, 1.6800e+00,
       2.1000e+00, 1.3000e+00, 1.9000e+00, 1.1900e+01, 7.1800e+00,
       8.9000e+00, 2.1900e+00, 1.9800e+00, 2.1500e+00, 4.9900e+01,
       1.0100e+00, 6.8900e+00, 4.2199e+02, 3.0300e+00, 6.2800e+00,
       1.2890e+01, 1.4000e+01, 3.1400e+00, 8.5000e-01, 1.1400e+01,
       1.0900e+01, 1.2000e+00, 1.5800e+01, 5.3400e+00, 1.1900e+00,
       1.7000e+00, 6.9990e+01, 4.2000e+00, 2.0000e+01, 6.2000e+00,
       1.7000e+01, 1.1290e+01, 1.1500e+01, 1.1790e+01, 8.0200e+00,
       2.2600e+00, 5.3900e+00, 9.9500e+00, 1.8000e+00, 3.9800e+00,
       1.5400e+01, 6.5400e+00, 3.1500e+00, 2.3500e+00, 1.8900e+01,
       3.0200e+00, 7.4200e+00, 7.7000e-01, 2.9500e+00, 5.2000e+00,
       8.5000e+00, 6.1000e+00])

len(df['price'].unique())

272

df['price'].hist()

<AxesSubplot:>

from pingouin import normality

normality(df['price'],method='normaltest')

df['log_price'] = np.log(df['price']+1)

df['log_price'].hist()

<AxesSubplot:>

df['log_log_price'] = np.log(df['log_price']+1)

df['log_log_price'].hist()

<AxesSubplot:>

Agora que todas as variáveis foram avaliadas, vou exportar 3 bases diferentes, baseado na média de owners, como dito anteriormente. Farei as regressões em outro arquivo.

Colunas = ['appid','name','Year','english','platforms_windows','platforms_mac','platforms_linux','genres_Action',
           'genres_Free to Play', 'genres_Strategy','genres_Adventure', 'genres_Indie', 'genres_RPG',
           'genres_Casual','genres_Simulation', 'genres_Racing', 'genres_Violent','genres_Massively Multiplayer',
           'genres_Nudity', 'genres_Sports',       'genres_Early Access', 'genres_Gore', 'genres_Sexual Content',
           'categories_Multi-player', 'categories_Online Multi-Player','categories_Local Multi-Player',
           'categories_Valve Anti-Cheat enabled','categories_Single-player', 'categories_Steam Cloud',
           'categories_Steam Achievements', 'categories_Steam Trading Cards','categories_Captions available',
           'categories_Partial Controller Support','categories_Includes Source SDK','categories_Cross-Platform Multiplayer',
           'categories_Stats','categories_Commentary available', 'categories_Includes level editor',
           'categories_Steam Workshop', 'categories_In-App Purchases','categories_Co-op',
           'categories_Full controller support','categories_Steam Leaderboards', 'categories_SteamVR Collectibles',
           'categories_Online Co-op', 'categories_Shared/Split Screen','categories_Local Co-op', 'categories_MMO', 
           'categories_VR Support','categories_Mods', 'categories_Mods (require HL2)',
           'categories_Steam Turn Notifications','achievements','%_ratings','QTD_ratings','owners_mean',
           'average_playtime_>0','price','log_price','log_log_price']

output_owners_min = df[df['owners_mean'] == df['owners_mean'].min()][Colunas]

output_owners = df[df['owners_mean'] != df['owners_mean'].min()][Colunas]

output_geral = df[Colunas]

output_owners_min.to_csv('Dados Tratados/output_owners_min.csv', sep=';')

output_owners.to_csv('Dados Tratados/output_owners.csv', sep=';')

output_geral.to_csv('Dados Tratados/output_geral.csv', sep=';')

Dados tratados e discriminados, agora vamos para as regressões... no arquivo "Regressões.ipynb" dentro da mesma pasta, ou, acessando ao link: https://bhscjhdvds.github.io/Minor/Regress%C3%B5es.html

	appid	english	required_age	achievements	positive_ratings	negative_ratings	average_playtime	median_playtime	price
count	2.707500e+04	27075.000000	27075.000000	27075.000000	2.707500e+04	27075.000000	27075.000000	27075.00000	27075.000000
mean	5.962035e+05	0.981127	0.354903	45.248864	1.000559e+03	211.027147	149.804949	146.05603	6.078193
std	2.508942e+05	0.136081	2.406044	352.670281	1.898872e+04	4284.938531	1827.038141	2353.88008	7.874922
min	1.000000e+01	0.000000	0.000000	0.000000	0.000000e+00	0.000000	0.000000	0.00000	0.000000
25%	4.012300e+05	1.000000	0.000000	0.000000	6.000000e+00	2.000000	0.000000	0.00000	1.690000
50%	5.990700e+05	1.000000	0.000000	7.000000	2.400000e+01	9.000000	0.000000	0.00000	3.990000
75%	7.987600e+05	1.000000	0.000000	23.000000	1.260000e+02	42.000000	0.000000	0.00000	7.190000
max	1.069460e+06	1.000000	18.000000	9821.000000	2.644404e+06	487076.000000	190625.000000	190625.00000	421.990000

	steam_appid	detailed_description	about_the_game	short_description
0	10	Play the world's number 1 online action game. ...	Play the world's number 1 online action game. ...	Play the world's number 1 online action game. ...
1	20	One of the most popular online action games of...	One of the most popular online action games of...	One of the most popular online action games of...
2	30	Enlist in an intense brand of Axis vs. Allied ...	Enlist in an intense brand of Axis vs. Allied ...	Enlist in an intense brand of Axis vs. Allied ...
3	40	Enjoy fast-paced multiplayer gaming with Death...	Enjoy fast-paced multiplayer gaming with Death...	Enjoy fast-paced multiplayer gaming with Death...
4	50	Return to the Black Mesa Research Facility as ...	Return to the Black Mesa Research Facility as ...	Return to the Black Mesa Research Facility as ...
5	60	A futuristic action game that challenges your ...	A futuristic action game that challenges your ...	A futuristic action game that challenges your ...
6	70	Named Game of the Year by over 50 publications...	Named Game of the Year by over 50 publications...	Named Game of the Year by over 50 publications...
7	80	With its extensive Tour of Duty campaign, a ne...	With its extensive Tour of Duty campaign, a ne...	With its extensive Tour of Duty campaign, a ne...
8	130	Made by Gearbox Software and originally releas...	Made by Gearbox Software and originally releas...	Made by Gearbox Software and originally releas...
9	220	1998. HALF-LIFE sends a shock through the game...	1998. HALF-LIFE sends a shock through the game...	1998. HALF-LIFE sends a shock through the game...

	steam_appid
count	2.733400e+04
mean	5.982886e+05
std	2.512113e+05
min	1.000000e+01
25%	4.030925e+05
50%	6.019650e+05
75%	8.011750e+05
max	1.069460e+06

	steam_appid	header_image	screenshots	background	movies
0	10	https://steamcdn-a.akamaihd.net/steam/apps/10/...	[{'id': 0, 'path_thumbnail': 'https://steamcdn...	https://steamcdn-a.akamaihd.net/steam/apps/10/...	NaN
1	20	https://steamcdn-a.akamaihd.net/steam/apps/20/...	[{'id': 0, 'path_thumbnail': 'https://steamcdn...	https://steamcdn-a.akamaihd.net/steam/apps/20/...	NaN
2	30	https://steamcdn-a.akamaihd.net/steam/apps/30/...	[{'id': 0, 'path_thumbnail': 'https://steamcdn...	https://steamcdn-a.akamaihd.net/steam/apps/30/...	NaN
3	40	https://steamcdn-a.akamaihd.net/steam/apps/40/...	[{'id': 0, 'path_thumbnail': 'https://steamcdn...	https://steamcdn-a.akamaihd.net/steam/apps/40/...	NaN
4	50	https://steamcdn-a.akamaihd.net/steam/apps/50/...	[{'id': 0, 'path_thumbnail': 'https://steamcdn...	https://steamcdn-a.akamaihd.net/steam/apps/50/...	NaN

	steam_appid	pc_requirements	mac_requirements	linux_requirements	minimum	recommended
0	10	{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...	{'minimum': 'Minimum: OS X Snow Leopard 10.6....	{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...	500 mhz processor, 96mb ram, 16mb video card, ...	NaN
1	20	{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...	{'minimum': 'Minimum: OS X Snow Leopard 10.6....	{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...	500 mhz processor, 96mb ram, 16mb video card, ...	NaN
2	30	{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...	{'minimum': 'Minimum: OS X Snow Leopard 10.6....	{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...	500 mhz processor, 96mb ram, 16mb video card, ...	NaN
3	40	{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...	{'minimum': 'Minimum: OS X Snow Leopard 10.6....	{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...	500 mhz processor, 96mb ram, 16mb video card, ...	NaN
4	50	{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...	{'minimum': 'Minimum: OS X Snow Leopard 10.6....	{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...	500 mhz processor, 96mb ram, 16mb video card, ...	NaN

	appid	name	release_date	english	developer	publisher	platforms	categories	genres	steamspy_tags	positive_ratings	negative_ratings	average_playtime	median_playtime	owners	price
0	10	Counter-Strike	2000-11-01	1	Valve	Valve	windows;mac;linux	Multi-player;Online Multi-Player;Local Multi-P...	Action	Action;FPS;Multiplayer	124534	3339	17612	317	10000000-20000000	7.19
1	20	Team Fortress Classic	1999-04-01	1	Valve	Valve	windows;mac;linux	Multi-player;Online Multi-Player;Local Multi-P...	Action	Action;FPS;Multiplayer	3318	633	277	62	5000000-10000000	3.99
2	30	Day of Defeat	2003-05-01	1	Valve	Valve	windows;mac;linux	Multi-player;Valve Anti-Cheat enabled	Action	FPS;World War II;Multiplayer	3416	398	187	34	5000000-10000000	3.99
3	40	Deathmatch Classic	2001-06-01	1	Valve	Valve	windows;mac;linux	Multi-player;Online Multi-Player;Local Multi-P...	Action	Action;FPS;Multiplayer	1273	267	258	184	5000000-10000000	3.99
4	50	Half-Life: Opposing Force	1999-11-01	1	Gearbox Software	Valve	windows;mac;linux	Single-player;Multi-player;Valve Anti-Cheat en...	Action	FPS;Action;Sci-fi	5250	288	624	415	5000000-10000000	3.99

	steam_appid	website	support_url	support_email
0	10	NaN	http://steamcommunity.com/app/10	NaN
1	30	http://www.dayofdefeat.com/	NaN	NaN
2	50	NaN	https://help.steampowered.com	NaN
3	70	http://www.half-life.com/	http://steamcommunity.com/app/70	NaN
4	80	NaN	http://steamcommunity.com/app/80	NaN

	appid	1980s	1990s	...	world_war_i	world_war_ii	e_sports
0	10	144	564	...	0	0	550
1	20	0	71	...	0	0	0
2	30	0	0	...	5	122	0
3	40	0	0	...	0	0	0
4	50	0	77	...	0	0	0

	appid	name	release_date	english	developer	publisher	platforms	required_age	categories	genres	steamspy_tags	achievements	positive_ratings	negative_ratings	average_playtime	median_playtime	owners	price	Year
1297	207370	eXceed - Gun Bullet Children	2012-08-02	0	Flat Software	Nyu Media	windows	0	Single-player;Steam Trading Cards	Action;Indie	Bullet Hell;Anime;Shoot 'Em Up	0	204	215	97	108	50000-100000	1.99	2012
2221	262300	Tsukumogami	2012-05-31	0	TORaIKI	Fruitbat Factory	windows	0	Single-player	Adventure;Indie;RPG	RPG;Indie;Adventure	49	33	17	0	0	50000-100000	22.99	2012
3001	292990	NOBUNAGA'S AMBITION: Souzou (Traditional Chine...	2014-08-01	0	KOEI TECMO GAMES CO., LTD.	KOEI TECMO GAMES CO., LTD.	windows	0	Single-player;Steam Cloud	Strategy	Strategy;Historical	0	413	74	0	0	20000-50000	17.99	2014
3085	295950	新流星搜劍錄	2017-12-20	0	Titans	Titans	windows	18	Multi-player;Online Multi-Player;MMO	Violent;Action;Adventure;Indie;Massively Multi...	Action;RPG;Fighting	0	664	323	0	0	0-20000	15.49	2017
3745	318300	Phineas and Ferb: New Inventions	2015-02-24	0	PIPE Studio	Disney Interactive	windows	0	Single-player	Adventure	Adventure;Psychological Horror;Sexual Content	0	5	7	0	0	0-20000	15.49	2015
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
27021	1056010	Не падай!	2019-04-17	0	SharF Co.	SharF Co.	windows	0	Single-player;Steam Achievements	Casual;Indie;Racing;Simulation	Casual;Indie;Simulation	3	1	1	0	0	0-20000	1.69	2019
27024	1056500	VRLife	2019-04-11	0	Wuhan Mandrill VR Co., Ltd.	Wuhan Mandrill VR Co., Ltd.	windows	0	Online Multi-Player;MMO	Casual;Free to Play;Massively Multiplayer;RPG;...	Early Access;Free to Play;Massively Multiplayer	0	16	3	0	0	0-20000	0.00	2019
27026	1056710	VTB Basketball League VR	2019-04-12	0	Axyos Games	LikeVR	windows	0	Single-player	Casual;Free to Play;Sports	Free to Play;Casual;Sports	0	3	0	0	0	0-20000	0.00	2019
27036	1058000	Rain's love memory-雨的恋记	2019-04-19	0	the original air	the original air	windows	0	Single-player	Free to Play	Free to Play;Visual Novel	0	33	2	0	0	0-20000	0.00	2019
27060	1062240	地铁：恐怖末班车	2019-04-25	0	QZQ Studio	H.P.Y.S,LLC	windows	0	Single-player	Action;Adventure;Indie;RPG	Action;Adventure;RPG	0	0	2	0	0	0-20000	2.09	2019

	appid	name	release_date	english	developer	publisher	platforms	categories	genres	...	positive_ratings	negative_ratings	owners	price	Year	platforms_windows	platforms_mac
27065	1063560	12 HOURS	2019-04-24	1	Second Reality	Second Reality	windows	Single-player	Action;Indie;Simulation;Early Access	...	9	1	0-20000	2.09	2019	1	0
27066	1064060	The Mystery of Bikini Island	2019-04-24	1	Velvet Paradise Games	Velvet Paradise Games	windows	Single-player	Adventure;Casual;Indie;RPG;Early Access	...	3	0	0-20000	0.79	2019	1	0
27067	1064580	CaptainMarlene	2019-04-23	1	Dnovel	Dnovel	windows	Single-player	Adventure;Indie;Early Access	...	13	0	0-20000	9.29	2019	1	0
27068	1064890	Old Edge II	2019-04-23	1	LTrust	LTrust	windows	Single-player	Adventure;Casual;Indie;Simulation	...	4	0	0-20000	0.79	2019	1	0
27069	1065160	Deatherem	2019-04-23	1	Anima	Team Syukino	windows	Single-player;Steam Achievements	Indie	...	2	2	0-20000	2.09	2019	1	0
27070	1065230	Room of Pandora	2019-04-24	1	SHEN JIAWEI	SHEN JIAWEI	windows	Single-player;Steam Achievements	Adventure;Casual;Indie	...	3	0	0-20000	2.09	2019	1	0
27071	1065570	Cyber Gun	2019-04-23	1	Semyon Maximov	BekkerDev Studio	windows	Single-player	Action;Adventure;Indie	...	8	1	0-20000	1.69	2019	1	0
27072	1065650	Super Star Blast	2019-04-24	1	EntwicklerX	EntwicklerX	windows	Single-player;Multi-player;Co-op;Shared/Split ...	Action;Casual;Indie	...	0	1	0-20000	3.99	2019	1	0
27073	1066700	New Yankee 7: Deer Hunters	2019-04-17	1	Yustas Game Studio	Alawar Entertainment	windows;mac	Single-player;Steam Cloud	Adventure;Casual;Indie	...	2	0	0-20000	5.19	2019	1	1
27074	1069460	Rune Lord	2019-04-24	1	Adept Studios GD	Alawar Entertainment	windows;mac	Single-player;Steam Cloud	Adventure;Casual;Indie	...	4	0	0-20000	5.19	2019	1	1

	platforms_windows	platforms_mac	platforms_linux
platforms_windows	1.000000	-0.014920	-0.007111
platforms_mac	-0.014920	1.000000	0.626648
platforms_linux	-0.007111	0.626648	1.000000

	0	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17
0	Multi-player	Online Multi-Player	Local Multi-Player	Valve Anti-Cheat enabled	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1	Multi-player	Online Multi-Player	Local Multi-Player	Valve Anti-Cheat enabled	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2	Multi-player	Valve Anti-Cheat enabled	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
3	Multi-player	Online Multi-Player	Local Multi-Player	Valve Anti-Cheat enabled	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4	Single-player	Multi-player	Valve Anti-Cheat enabled	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
27070	Single-player	Steam Achievements	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
27071	Single-player	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
27072	Single-player	Multi-player	Co-op	Shared/Split Screen	Steam Achievements	Full controller support	Steam Cloud	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
27073	Single-player	Steam Cloud	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
27074	Single-player	Steam Cloud	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	categories_Multi-player	categories_Online Multi-Player	categories_Local Multi-Player	categories_Valve Anti-Cheat enabled	categories_Single-player	categories_Steam Cloud	categories_Steam Achievements	categories_Steam Trading Cards	categories_Captions available	categories_Partial Controller Support	...	categories_QTD
0	1	1	1	1	0	0	0	0	0	0	...	2
1	1	1	1	1	0	0	0	0	0	0	...	2
2	1	0	0	1	0	0	0	0	0	0	...	2
3	1	1	1	1	0	0	0	0	0	0	...	2
4	1	0	0	1	1	0	0	0	0	0	...	2
5	1	1	0	1	0	0	0	0	0	0	...	2
6	1	1	0	1	1	1	0	0	0	0	...	2
7	1	0	0	1	1	0	0	0	0	0	...	2
8	0	0	0	0	1	0	0	0	0	0	...	1
9	0	0	0	0	1	1	1	1	1	1	...	2

	categories_Multi-player	categories_Online Multi-Player	categories_Local Multi-Player	categories_Valve Anti-Cheat enabled	categories_Single-player	categories_Steam Cloud	categories_Steam Achievements	categories_Steam Trading Cards	categories_Captions available	categories_Partial Controller Support	...	categories_SteamVR Collectibles	categories_Online Co-op	categories_Shared/Split Screen	categories_Local Co-op	categories_MMO	categories_VR Support	categories_Mods	categories_Mods (require HL2)	categories_Steam Turn Notifications	categories_QTD
categories_Multi-player	1.000000	0.503722	0.321236	0.122792	-0.268433	0.024880	0.008365	0.038045	-0.022577	0.049867	...	0.011220	0.327593	0.273619	0.156427	0.173948	0.022803	0.020723	0.014653	0.081779	0.223955
categories_Online Multi-Player	0.503722	1.000000	0.244950	0.048625	-0.364006	-0.009864	-0.005357	0.000474	-0.022421	0.022914	...	0.031050	0.541671	0.107472	0.088209	0.233930	0.006648	0.012146	-0.001933	0.056302	0.156664
categories_Local Multi-Player	0.321236	0.244950	1.000000	0.003693	-0.107629	0.042106	0.037201	0.000239	-0.019380	0.057296	...	-0.009688	0.125707	0.576333	0.532393	-0.020309	-0.008103	-0.002165	-0.001531	0.062281	0.136928
categories_Valve Anti-Cheat enabled	0.122792	0.048625	0.003693	1.000000	-0.085582	0.029730	0.027583	0.028312	0.017538	0.017807	...	0.014078	0.026680	-0.005737	0.001047	0.038259	0.015006	0.145617	0.102965	-0.002851	0.032608
categories_Single-player	-0.268433	-0.364006	-0.107629	-0.085582	1.000000	0.079473	0.083920	0.036539	0.021987	-0.015416	...	-0.008416	-0.157390	-0.066024	-0.023562	-0.340444	-0.011039	-0.036849	-0.026056	-0.009527	-0.041680
categories_Steam Cloud	0.024880	-0.009864	0.042106	0.029730	0.079473	1.000000	0.385004	0.295040	0.112453	0.023476	...	0.007252	0.015186	0.092394	0.067056	-0.062278	-0.005986	0.014255	0.010079	0.017686	0.330935
categories_Steam Achievements	0.008365	-0.005357	0.037201	0.027583	0.083920	0.385004	1.000000	0.337887	0.091724	0.055034	...	0.002165	0.018232	0.081981	0.066869	-0.049431	-0.003662	0.008227	0.005817	0.033948	0.574558
categories_Steam Trading Cards	0.038045	0.000474	0.000239	0.028312	0.036539	0.295040	0.337887	1.000000	0.063114	0.037730	...	0.000639	0.026572	0.068636	0.034050	0.013703	0.007455	-0.005526	-0.003907	0.031304	0.353633
categories_Captions available	-0.022577	-0.022421	-0.019380	0.017538	0.021987	0.112453	0.091724	0.063114	1.000000	0.002684	...	-0.000389	-0.010028	-0.018071	-0.010888	-0.015225	0.014588	-0.001422	-0.001005	0.001535	0.089208
categories_Partial Controller Support	0.049867	0.022914	0.057296	0.017807	-0.015416	0.023476	0.055034	0.037730	0.002684	1.000000	...	-0.016561	0.026876	0.073484	0.044791	-0.005619	0.016447	-0.003701	-0.002617	-0.018682	0.232807
categories_Includes Source SDK	0.051886	0.031266	0.016980	0.155159	-0.056663	0.015500	0.009742	-0.009571	0.064281	0.024131	...	-0.001384	0.019067	0.000829	0.003346	0.012095	0.019015	0.238900	0.168925	0.019597	0.017447
categories_Cross-Platform Multiplayer	0.374398	0.370801	0.100777	0.039278	-0.151986	0.010995	0.038840	0.058418	-0.012639	0.008286	...	0.021626	0.244127	0.087233	0.035730	0.154290	0.032361	0.020197	-0.001239	0.123273	0.112659
categories_Stats	0.110646	0.093364	0.056457	0.055558	-0.037524	0.147696	0.212999	0.099639	0.044234	0.034147	...	0.004638	0.063926	0.071334	0.051399	-0.004937	0.007867	0.014568	-0.001659	0.056210	0.150478
categories_Commentary available	-0.005935	-0.009190	-0.007696	0.021583	0.003283	0.022514	0.015095	0.031132	0.098297	0.002071	...	-0.002813	-0.004419	-0.008346	-0.009514	0.011331	0.004259	-0.000628	-0.000444	0.028087	0.040397
categories_Includes level editor	0.113675	0.069886	0.040000	0.060239	-0.030067	0.090900	0.064109	0.063499	0.037564	0.040811	...	0.002353	0.072127	0.056699	0.040200	-0.001726	0.038018	0.043090	0.030469	0.026331	0.105190
categories_Steam Workshop	0.103998	0.078313	0.025700	0.101343	-0.040783	0.129659	0.091247	0.106010	0.020653	0.029389	...	0.003626	0.084188	0.037157	0.029715	-0.006585	0.043410	0.046432	0.032832	0.021042	0.094462
categories_In-App Purchases	0.143530	0.292627	-0.010052	0.042253	-0.247277	-0.042387	-0.017876	0.020718	-0.007824	-0.007035	...	-0.000118	0.153554	-0.026724	-0.016911	0.314963	-0.009905	-0.001390	-0.000983	0.031104	0.084908
categories_Co-op	0.426840	0.320879	0.323010	0.070195	-0.128345	0.066205	0.067465	0.067324	-0.003380	0.068216	...	0.026591	0.622128	0.383019	0.618491	0.149355	0.009239	0.011772	0.018631	0.007646	0.179916
categories_Full controller support	0.060719	0.004983	0.188761	0.012677	0.024928	0.249732	0.262679	0.203712	0.042411	-0.218466	...	-0.008055	0.025444	0.277864	0.206736	-0.050947	0.043760	-0.004436	-0.003137	0.016455	0.283196
categories_Steam Leaderboards	0.090653	0.043821	0.058479	0.032171	0.044847	0.219241	0.292489	0.134663	-0.002466	0.037014	...	0.019982	0.033554	0.129848	0.078662	-0.034492	0.021296	-0.003279	-0.002318	0.041432	0.209940
categories_SteamVR Collectibles	0.011220	0.031050	-0.009688	0.014078	-0.008416	0.007252	0.002165	0.000639	-0.000389	-0.016561	...	1.000000	0.041529	-0.011303	-0.007761	0.002938	0.174176	-0.000331	-0.000234	-0.001858	0.021250
categories_Online Co-op	0.327593	0.541671	0.125707	0.026680	-0.157390	0.015186	0.018232	0.026572	-0.010028	0.026876	...	0.041529	1.000000	0.062960	0.194615	0.187380	0.012078	-0.001744	-0.001233	0.009863	0.112115
categories_Shared/Split Screen	0.273619	0.107472	0.576333	-0.005737	-0.066024	0.092394	0.081981	0.068636	-0.018071	0.073484	...	-0.011303	0.062960	1.000000	0.508408	-0.031412	-0.012412	-0.002526	-0.001786	0.036821	0.162335
categories_Local Co-op	0.156427	0.088209	0.532393	0.001047	-0.023562	0.067056	0.066869	0.034050	-0.010888	0.044791	...	-0.007761	0.194615	0.508408	1.000000	-0.019197	-0.008359	-0.001734	-0.001226	0.010027	0.111010
categories_MMO	0.173948	0.233930	-0.020309	0.038259	-0.340444	-0.062278	-0.049431	0.013703	-0.015225	-0.005619	...	0.002938	0.187380	-0.031412	-0.019197	1.000000	0.007816	-0.001080	-0.000764	0.012518	0.058146
categories_VR Support	0.022803	0.006648	-0.008103	0.015006	-0.011039	-0.005986	-0.003662	0.007455	0.014588	0.016447	...	0.174176	0.012078	-0.012412	-0.008359	0.007816	1.000000	-0.000797	-0.000564	0.003855	0.045555
categories_Mods	0.020723	0.012146	-0.002165	0.145617	-0.036849	0.014255	0.008227	-0.005526	-0.001422	-0.003701	...	-0.000331	-0.001744	-0.002526	-0.001734	-0.001080	-0.000797	1.000000	0.707094	-0.000415	0.004748
categories_Mods (require HL2)	0.014653	-0.001933	-0.001531	0.102965	-0.026056	0.010079	0.005817	-0.003907	-0.001005	-0.002617	...	-0.000234	-0.001233	-0.001786	-0.001226	-0.000764	-0.000564	0.707094	1.000000	-0.000294	0.003357
categories_Steam Turn Notifications	0.081779	0.056302	0.062281	-0.002851	-0.009527	0.017686	0.033948	0.031304	0.001535	-0.018682	...	-0.001858	0.009863	0.036821	0.010027	0.012518	0.003855	-0.000415	-0.000294	1.000000	0.026680
categories_QTD	0.223955	0.156664	0.136928	0.032608	-0.041680	0.330935	0.574558	0.353633	0.089208	0.232807	...	0.021250	0.112115	0.162335	0.111010	0.058146	0.045555	0.004748	0.003357	0.026680	1.000000

	genres_Action	genres_Free to Play	genres_Strategy	genres_Adventure	genres_Indie	genres_RPG	genres_Casual	genres_Simulation	genres_Racing	genres_Violent	genres_Massively Multiplayer	genres_Nudity	genres_Sports	genres_Early Access	genres_Gore	genres_Sexual Content
0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
4	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
5	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
6	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
7	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
8	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
9	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

	genres_Action	genres_Free to Play	genres_Strategy	genres_Adventure	genres_Indie	genres_RPG	genres_Casual	genres_Simulation	genres_Racing	genres_Violent	genres_Massively Multiplayer	genres_Nudity	genres_Sports	genres_Early Access	genres_Gore	genres_Sexual Content
genres_Action	1.000000	-0.025227	-0.156237	0.012327	0.097356	-0.029661	-0.173166	-0.140261	0.022335	0.099691	0.053049	-0.023407	-0.010067	0.102506	0.077398	-0.040695
genres_Free to Play	-0.025227	1.000000	0.030372	-0.043528	-0.051140	0.044122	0.003528	0.003859	-0.007747	-0.012468	0.286455	-0.004336	0.005295	0.026745	-0.012984	0.008891
genres_Strategy	-0.156237	0.030372	1.000000	-0.142592	-0.025821	0.079887	-0.015110	0.144437	-0.064775	-0.029917	0.044826	-0.030324	-0.020096	0.043788	-0.031379	-0.028519
genres_Adventure	0.012327	-0.043528	-0.142592	1.000000	0.064279	0.167907	-0.039149	-0.100087	-0.075781	0.039973	-0.008250	0.024597	-0.115238	-0.019387	0.049975	0.012538
genres_Indie	0.097356	-0.051140	-0.025821	0.064279	1.000000	0.022734	0.103029	-0.043002	-0.032554	0.005045	-0.066387	-0.001858	-0.039328	0.075940	0.027172	0.004342
genres_RPG	-0.029661	0.044122	0.079887	0.167907	0.022734	1.000000	-0.135654	-0.034968	-0.058467	0.003407	0.119434	0.019778	-0.068092	0.064402	0.002153	0.010325
genres_Casual	-0.173166	0.003528	-0.015110	-0.039149	0.103029	-0.135654	1.000000	0.057542	-0.012311	-0.062811	-0.040214	0.024817	0.026661	-0.068142	-0.057875	0.030669
genres_Simulation	-0.140261	0.003859	0.144437	-0.100087	-0.043002	-0.034968	0.057542	1.000000	0.085044	-0.018619	0.029179	0.005438	0.187221	0.090294	-0.023868	0.010677
genres_Racing	0.022335	-0.007747	-0.064775	-0.075781	-0.032554	-0.058467	-0.012311	0.085044	1.000000	-0.014635	0.024658	-0.015994	0.267448	0.025705	-0.011741	-0.015021
genres_Violent	0.099691	-0.012468	-0.029917	0.039973	0.005045	0.003407	-0.062811	-0.018619	-0.014635	1.000000	0.005799	0.193451	-0.020211	0.033552	0.655706	0.153530
genres_Massively Multiplayer	0.053049	0.286455	0.044826	-0.008250	-0.066387	0.119434	-0.040214	0.029179	0.024658	0.005799	1.000000	-0.009640	0.014371	0.100754	-0.002298	-0.013520
genres_Nudity	-0.023407	-0.004336	-0.030324	0.024597	-0.001858	0.019778	0.024817	0.005438	-0.015994	0.193451	-0.009640	1.000000	-0.021029	-0.010665	0.157789	0.599445
genres_Sports	-0.010067	0.005295	-0.020096	-0.115238	-0.039328	-0.068092	0.026661	0.187221	0.267448	-0.020211	0.014371	-0.021029	1.000000	0.044756	-0.020177	-0.018220
genres_Early Access	0.102506	0.026745	0.043788	-0.019387	0.075940	0.064402	-0.068142	0.090294	0.025705	0.033552	0.100754	-0.010665	0.044756	1.000000	0.024796	-0.022106
genres_Gore	0.077398	-0.012984	-0.031379	0.049975	0.027172	0.002153	-0.057875	-0.023868	-0.011741	0.655706	-0.002298	0.157789	-0.020177	0.024796	1.000000	0.106715
genres_Sexual Content	-0.040695	0.008891	-0.028519	0.012538	0.004342	0.010325	0.030669	0.010677	-0.015021	0.153530	-0.013520	0.599445	-0.018220	-0.022106	0.106715	1.000000

	delta_ratings	%_ratings	QTD_ratings
delta_ratings	1.00000	0.035700	0.955830
%_ratings	0.03570	1.000000	0.025694
QTD_ratings	0.95583	0.025694	1.000000

	QTD_ratings	positive_ratings
owners_mean
10000.0	40.73	30.74
35000.0	235.27	180.27
75000.0	502.16	385.48
150000.0	1065.38	839.09
350000.0	2206.54	1753.69
750000.0	6325.22	5142.36
1500000.0	14237.02	11781.62
3500000.0	30292.18	26056.07
7500000.0	81027.04	69231.24
15000000.0	150476.43	126773.57
35000000.0	373593.33	344998.00
75000000.0	2014988.50	1570294.00
150000000.0	1005586.00	863507.00

	owners_mean	QTD_ratings	positive_ratings
0	10000.0	40.73	30.74
1	35000.0	235.27	180.27
2	75000.0	502.16	385.48
3	150000.0	1065.38	839.09
4	350000.0	2206.54	1753.69
5	750000.0	6325.22	5142.36
6	1500000.0	14237.02	11781.62
7	3500000.0	30292.18	26056.07
8	7500000.0	81027.04	69231.24
9	15000000.0	150476.43	126773.57
10	35000000.0	373593.33	344998.00
11	75000000.0	2014988.50	1570294.00
12	150000000.0	1005586.00	863507.00

	appid	1980s	1990s	...	world_war_i	world_war_ii	e_sports
0	10	144	564	...	0	0	550
1	20	0	71	...	0	0	0
2	30	0	0	...	5	122	0
3	40	0	0	...	0	0	0
4	50	0	77	...	0	0	0

	categories_Multi-player	categories_Online Multi-Player	categories_Local Multi-Player	categories_Valve Anti-Cheat enabled	categories_Single-player	categories_Steam Cloud	categories_Steam Achievements	categories_Steam Trading Cards	categories_Captions available	categories_Partial Controller Support	...	categories_QTD
0	1	1	1	1	0	0	0	0	0	0	...	2
1	1	1	1	1	0	0	0	0	0	0	...	2
2	1	0	0	1	0	0	0	0	0	0	...	2
3	1	1	1	1	0	0	0	0	0	0	...	2
4	1	0	0	1	1	0	0	0	0	0	...	2
5	1	1	0	1	0	0	0	0	0	0	...	2
6	1	1	0	1	1	1	0	0	0	0	...	2
7	1	0	0	1	1	0	0	0	0	0	...	2
8	0	0	0	0	1	0	0	0	0	0	...	1
9	0	0	0	0	1	1	1	1	1	1	...	2

	genres_Action	genres_Free to Play	genres_Strategy	genres_Adventure	genres_Indie	genres_RPG	genres_Casual	genres_Simulation	genres_Racing	genres_Violent	genres_Massively Multiplayer	genres_Nudity	genres_Sports	genres_Early Access	genres_Gore	genres_Sexual Content
0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
4	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
5	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
6	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
7	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
8	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
9	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0

	negative_ratings	positive_ratings
negative_ratings	1.000000	0.762909
positive_ratings	0.762909	1.000000

	name	positive_ratings	negative_ratings	owners	QTD_ratings
679	Kuros	7	0	0-20000	7
694	Wandering Willows™	20	0	0-20000	20
700	Fitness Dash™	8	0	0-20000	8
713	Build-A-Lot 2: Town of the Year	3	0	0-20000	3
714	Build-A-Lot 3: Passport to Europe	2	0	0-20000	2
...	...	...	...	...	...
27067	CaptainMarlene	13	0	0-20000	13
27068	Old Edge II	4	0	0-20000	4
27070	Room of Pandora	3	0	0-20000	3
27073	New Yankee 7: Deer Hunters	2	0	0-20000	2
27074	Rune Lord	4	0	0-20000	4

	name	positive_ratings	negative_ratings	owners	QTD_ratings
577	Raycatcher	0	13	0-20000	13
683	Angle of Attack	0	2	0-20000	2
4503	Pluck	0	3	0-20000	3
4742	ProtoMasons	0	2	0-20000	2
4757	Vol'Talkes - The AI War	0	2	0-20000	2
...	...	...	...	...	...
27008	DevilShaft: TheTower	0	1	0-20000	1
27051	Space Trade Fleet 1.5	0	1	0-20000	1
27056	Hello, World.	0	1	0-20000	1
27060	地铁：恐怖末班车	0	2	0-20000	2
27072	Super Star Blast	0	1	0-20000	1

	name	average_playtime	median_playtime
26	Rag Doll Kung Fu	0	0
29	SiN Episodes: Emergence	0	0
34	Dangerous Waters	0	0
35	Space Empires IV Deluxe	0	0
36	Disciples II: Rise of the Elves	0	0
...	...	...	...
27070	Room of Pandora	0	0
27071	Cyber Gun	0	0
27072	Super Star Blast	0	0
27073	New Yankee 7: Deer Hunters	0	0
27074	Rune Lord	0	0

	appid	1980s	1990s	...	world_war_i	world_war_ii	e_sports
0	10	144	564	...	0	0	550
1	20	0	71	...	0	0	0
2	30	0	0	...	5	122	0
3	40	0	0	...	0	0	0
4	50	0	77	...	0	0	0

	categories_Multi-player	categories_Online Multi-Player	categories_Local Multi-Player	categories_Valve Anti-Cheat enabled	categories_Single-player	categories_Steam Cloud	categories_Steam Achievements	categories_Steam Trading Cards	categories_Captions available	categories_Partial Controller Support	...	categories_QTD
0	1	1	1	1	0	0	0	0	0	0	...	2
1	1	1	1	1	0	0	0	0	0	0	...	2
2	1	0	0	1	0	0	0	0	0	0	...	2
3	1	1	1	1	0	0	0	0	0	0	...	2
4	1	0	0	1	1	0	0	0	0	0	...	2
5	1	1	0	1	0	0	0	0	0	0	...	2
6	1	1	0	1	1	1	0	0	0	0	...	2
7	1	0	0	1	1	0	0	0	0	0	...	2
8	0	0	0	0	1	0	0	0	0	0	...	1
9	0	0	0	0	1	1	1	1	1	1	...	2

	genres_Action	genres_Free to Play	genres_Strategy	genres_Adventure	genres_Indie	genres_RPG	genres_Casual	genres_Simulation	genres_Racing	genres_Violent	genres_Massively Multiplayer	genres_Nudity	genres_Sports	genres_Early Access	genres_Gore	genres_Sexual Content
0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
4	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
5	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
6	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
7	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
8	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
9	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0