Jump to content

[Python/Pandas/Numpy] Filling in missing values in Dataframe based on values from another Dataframe

So given this Pandas Dataframe, what I want to do is to fill in missing NaN cells with values from another dataframe based on the values of that column for that particular class.

 

image.png.20302f7b59a2c74ad358c0055afc58f3.png

 

So for instance the first row is part of class 1, so its NaN value would be replaced with V as that's the value of the corresponding column 2 in the Class Dataframe for Class 1. 

 

I'm a bit lost on how I can perform this using Pandas. Is there are particular function in the Pandas library anyone would recommend that could perform this filling in of values?  

Intel® Core™ i7-12700 | GIGABYTE B660 AORUS MASTER DDR4 | Gigabyte Radeon™ RX 6650 XT Gaming OC | 32GB Corsair Vengeance® RGB Pro SL DDR4 | Samsung 990 Pro 1TB | WD Green 1.5TB | Windows 11 Pro | NZXT H510 Flow White
Sony MDR-V250 | GNT-500 | Logitech G610 Orion Brown | Logitech G402 | Samsung C27JG5 | ASUS ProArt PA238QR
iPhone 12 Mini (iOS 17.2.1) | iPhone XR (iOS 17.2.1) | iPad Mini (iOS 9.3.5) | KZ AZ09 Pro x KZ ZSN Pro X | Sennheiser HD450bt
Intel® Core™ i7-1265U | Kioxia KBG50ZNV512G | 16GB DDR4 | Windows 11 Enterprise | HP EliteBook 650 G9
Intel® Core™ i5-8520U | WD Blue M.2 250GB | 1TB Seagate FireCuda | 16GB DDR4 | Windows 11 Home | ASUS Vivobook 15 
Intel® Core™ i7-3520M | GT 630M | 16 GB Corsair Vengeance® DDR3 |
Samsung 850 EVO 250GB | macOS Catalina | Lenovo IdeaPad P580

Link to comment
Share on other sites

Link to post
Share on other sites

It's been a hot minute since I've touched pandas, but iirc you can use the .at() method on a dataframe object and use that to traverse the sheet like an array 

Community Standards || Tech News Posting Guidelines

---======================================================================---

CPU: R5 3600 || GPU: RTX 3070|| Memory: 32GB @ 3200 || Cooler: Scythe Big Shuriken || PSU: 650W EVGA GM || Case: NR200P

Link to comment
Share on other sites

Link to post
Share on other sites

  • 2 weeks later...

Hi,

 

I wanted to use the existing fillna functionality of pandas (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.fillna.html). There you can provide a dict with the to-fill values:

 

>>> df
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4

>>> values = {'A': 0, 'B': 1, 'C': 2, 'D': 3}
>>> df.fillna(value=values)
    A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 2.0 1
2   0.0 1.0 2.0 5
3   0.0 3.0 2.0 4

A bit of overhead to do it on a per-subset part given a dictionary as provided by the author:

 

def custom_fillna(dataframe, substitution, reference_column, inplace=False):
    if not inplace:
        dataframe = dataframe.copy()
    substitution = substitution.to_dict("list")

    references = substitution.pop(reference_column)
    # We do a per reference value replacement
    for idx, ref_value in enumerate(references):
        inject = {key: val[idx] for key, val in substitution.items()}
        mask = dataframe[reference_column] == ref_value
        dataframe.loc[mask] = dataframe.loc[mask].fillna(inject)

    return dataframe

df = pd.DataFrame(
    {
        "1": ["A", "B", np.nan, "C", "A"],
        "2": [np.nan, "C", "C", "V", np.nan],
        "3": ["N", np.nan, np.nan, "N", "M"],
        "Class": [1, 2, 1, 1, 2],
    }
)

substitution = pd.DataFrame(
	{
        "1": ["A", "B"],
        "2": ["V", "C"],
        "3": ["N", "M"],
        "Class": [1, 2],
	}
)

result = fillna(df, substitution=substitution, reference_column="Class")

result
   1  2  3  Class
0  A  V  N      1
1  B  C  M      2
2  A  C  N      1
3  C  V  N      1
4  A  C  M      2

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×