[Python/Pandas/Numpy] Filling in missing values in Dataframe based on values from another Dataframe

BlueChinchillaEatingDorito · November 3, 2020

So given this Pandas Dataframe, what I want to do is to fill in missing NaN cells with values from another dataframe based on the values of that column for that particular class.

So for instance the first row is part of class 1, so its NaN value would be replaced with V as that's the value of the corresponding column 2 in the Class Dataframe for Class 1.

I'm a bit lost on how I can perform this using Pandas. Is there are particular function in the Pandas library anyone would recommend that could perform this filling in of values?

Slottr · November 3, 2020

It's been a hot minute since I've touched pandas, but iirc you can use the .at() method on a dataframe object and use that to traverse the sheet like an array

lars-petter · November 11, 2020

Hi,

I wanted to use the existing fillna functionality of pandas (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.fillna.html). There you can provide a dict with the to-fill values:

>>> df
     A    B   C  D
0  NaN  2.0 NaN  0
1  3.0  4.0 NaN  1
2  NaN  NaN NaN  5
3  NaN  3.0 NaN  4

>>> values = {'A': 0, 'B': 1, 'C': 2, 'D': 3}
>>> df.fillna(value=values)
    A   B   C   D
0   0.0 2.0 2.0 0
1   3.0 4.0 2.0 1
2   0.0 1.0 2.0 5
3   0.0 3.0 2.0 4

A bit of overhead to do it on a per-subset part given a dictionary as provided by the author:

def custom_fillna(dataframe, substitution, reference_column, inplace=False):
    if not inplace:
        dataframe = dataframe.copy()
    substitution = substitution.to_dict("list")

    references = substitution.pop(reference_column)
    # We do a per reference value replacement
    for idx, ref_value in enumerate(references):
        inject = {key: val[idx] for key, val in substitution.items()}
        mask = dataframe[reference_column] == ref_value
        dataframe.loc[mask] = dataframe.loc[mask].fillna(inject)

    return dataframe

df = pd.DataFrame(
    {
        "1": ["A", "B", np.nan, "C", "A"],
        "2": [np.nan, "C", "C", "V", np.nan],
        "3": ["N", np.nan, np.nan, "N", "M"],
        "Class": [1, 2, 1, 1, 2],
    }
)

substitution = pd.DataFrame(
	{
        "1": ["A", "B"],
        "2": ["V", "C"],
        "3": ["N", "M"],
        "Class": [1, 2],
	}
)

result = fillna(df, substitution=substitution, reference_column="Class")

result
   1  2  3  Class
0  A  V  N      1
1  B  C  M      2
2  A  C  N      1
3  C  V  N      1
4  A  C  M      2