Improving elements: devices, games, bonuses, etc.

I added new rows for the data_generator.py script: adding devices, bonuses, probabilities to win, and lose. Created the .csv dataset of simulated sessions.

Main Goals of the day:

  • Inserting key variables in the .py script: win formula, probabilities for deposits and bonuses.
  • Creating a .csv dataset of the simulated 500 sessions.
  • Create a script for the daily report based on the dataset, and create the .csv daily report file.

Step by Step

πŸ“ Step 1: Verifying the probability distributions used

πŸ“ Step 2: Created the dataset .csv file to be used for SQL analysis

πŸ“ Step 3: Wrote a script to summarize the KPI daily factor for the simulated dataset.

πŸ“ Step 4: Created the .csv file to visualise data.

Challenges / Insights

  • Discovering suitable probability distributions for the dataset (e.g, uniform or exponential).
  • Adding key parameters for the simulation on the cycle (bonus, win, deposit…).
  • Printed the .csv dataset and daily report file on the /data folder, to be used for statistical analysis.

Code Snippet Final

```python
df = pd.read_csv('../data/player_sessions.csv')
df['session_start'] = pd.to_datetime(df['session_start'])
df['day'] = df['session_start'].dt.date

report = df.groupby('day').agg(
    total_sessions=('player_id', 'count'),
    total_deposit=('deposit_amount', 'sum'),
    avg_session_duration=('session_end', lambda x: (pd.to_datetime(x) - pd.to_datetime(df.loc[x.index, 'session_start'])).mean().total_seconds()/60),
    churn_7d=('player_id', lambda x: len(set(df[df['session_start'] < (pd.Timestamp.today() - pd.Timedelta(days=7))]['player_id']) & set(x)))  # semplificato
).reset_index()
```

Next Step

πŸ‘‰ Running the SQL script to analyse statistics, using player_sessions.csv daily_kpi_report.csv files.