Jump to content

pandas fill in missing data

Go to solution Solved by vorticalbox,
40 minutes ago, tikker said:

Ah yeah Pandas is neat like that.

I have found a better way :) create a second data frame from start to end of the first, fill with 0 then merge with the original df

 

    payments = pd.read_csv('payments.csv', header=0)
    df = pd.DataFrame()
    df['ds'] = pd.to_datetime(payments['date'])
    df['y'] = payments['total']

    df2 = pd.DataFrame((pd.date_range(start=df.ds.min(), end=df.ds.max(), freq='1D')), columns=["ds"])
    df = pd.merge(df2, df, on="ds", how="left").fillna(0)

 

say i have a csv

date,total
2017-07-20,30
2017-07-21,40
2017-07-25,6
2017-08-05,25
2017-08-18,50

 

how can i get

date,total
2017-07-20,30
2017-07-21,40
2017-07-22,0
2017-07-23,0
2017-07-24,0
2017-07-25,6
    payments = ps.read_csv('payments.csv', header=0)
    df = ps.DataFrame()
    df['ds'] = ps.to_datetime(payments['date'])
    df['y'] = payments['total']

 

I ma planning to use fbprophet to do some predictions.

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
https://linustechtips.com/topic/1249482-pandas-fill-in-missing-data/
Share on other sites

Link to post
Share on other sites

You could write a loop that checks expected dates and creates an entry if it doesn't exist.

Crystal: CPU: i7 7700K | Motherboard: Asus ROG Strix Z270F | RAM: GSkill 16 GB@3200MHz | GPU: Nvidia GTX 1080 Ti FE | Case: Corsair Crystal 570X (black) | PSU: EVGA Supernova G2 1000W | Monitor: Asus VG248QE 24"

Laptop: Dell XPS 13 9370 | CPU: i5 10510U | RAM: 16 GB

Server: CPU: i5 4690k | RAM: 16 GB | Case: Corsair Graphite 760T White | Storage: 19 TB

Link to post
Share on other sites

1 minute ago, tikker said:

You could write a loop that checks expected dates and creates an entry if it doesn't exist.

that i could but i just found this magic
 

 df = df.set_index('date').asfreq('d', fill_value=0)

 

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to post
Share on other sites

5 minutes ago, vorticalbox said:

that i could but i just found this magic
 


 df = df.set_index('date').asfreq('d', fill_value=0)

 

Ah yeah Pandas is neat like that.

Crystal: CPU: i7 7700K | Motherboard: Asus ROG Strix Z270F | RAM: GSkill 16 GB@3200MHz | GPU: Nvidia GTX 1080 Ti FE | Case: Corsair Crystal 570X (black) | PSU: EVGA Supernova G2 1000W | Monitor: Asus VG248QE 24"

Laptop: Dell XPS 13 9370 | CPU: i5 10510U | RAM: 16 GB

Server: CPU: i5 4690k | RAM: 16 GB | Case: Corsair Graphite 760T White | Storage: 19 TB

Link to post
Share on other sites

40 minutes ago, tikker said:

Ah yeah Pandas is neat like that.

I have found a better way :) create a second data frame from start to end of the first, fill with 0 then merge with the original df

 

    payments = pd.read_csv('payments.csv', header=0)
    df = pd.DataFrame()
    df['ds'] = pd.to_datetime(payments['date'])
    df['y'] = payments['total']

    df2 = pd.DataFrame((pd.date_range(start=df.ds.min(), end=df.ds.max(), freq='1D')), columns=["ds"])
    df = pd.merge(df2, df, on="ds", how="left").fillna(0)

 

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×