Any Excel specialists in here?

How do I change my excel file from this:

To this :slight_smile:

Any way to automate the process?

Hmmm… I’m not an Excel expert but have you tried looking into recording a macro for this? I’m not even sure if a macro can do this sort of stuff. Otherwise you can write a Python or Ruby script to do this for you. How many rows are we talking about?

1 Like

About 213,000

I tried python, and it’s kinda failing. So maybe you can help with a ruby script?

Shorter path: Maybe we can fix your Python code together, with the help of @Emp_Sel. Why’s your code failing?

Longer path: I can write a Ruby script for you, but I’ll have to revise a few Ruby Excel libraries first.

Which path do you want? Do you have a deadline?

I’ll try something on my end. Can you share your python code with me?

Am open to both. We can try the first, since am sure I may be missing just something little in the code
My deadline is 14th December.

OK so you may have to share your code and the Excel sheet, If you can’t share the sheet publicly, you can message it. I’m more interested in your Python algorithm and also to be able to see the whole set of info in the sheet.

lemme share my code.

import pandas as pd
from openpyxl.workbook import Workbook 

#loading file 
df = pd.read_excel("test data.xlsx")
df.head()

#checking the column name
df.get('PartName', default=1)

#splitting data into separate dataframes 
df1 = pd.DataFrame(df['PartName']).reset_index(drop=True)
df2 = pd.DataFrame({'a1': df['Substitutes'], 'a2':df['SubstituteParts']}).reset_index(drop=True)

#merging columns on to PartName
left_merged = df1.merge(df2[['a1','a2']], on='PartName', how='left')

This should do the work…In theory.

However I get KeyError:

Here’s the traceback:

KeyError Traceback (most recent call last)
in
----> 1 left_merged = df1.merge(df2[[‘a1’,‘a2’]], on=‘PartName’, how=‘left’, validate=‘one_to_one’,indicator=True)
2 #merged = pd.concat([df1,df2])
3 #full = pd.merge(df1, df2.drop_duplicates(subset=[‘a1’,‘a2’], keep=‘first’), left_on=‘a1’, right_on=‘a2’,how=‘left’).drop_duplicates()

~\anaconda3\envs\reggie_workspace\lib\site-packages\pandas\core\frame.py in merge(self, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
9184 from pandas.core.reshape.merge import merge
9185
→ 9186 return merge(
9187 self,
9188 right,

~\anaconda3\envs\reggie_workspace\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate)
105 validate: str | None = None,
106 ) → DataFrame:
→ 107 op = _MergeOperation(
108 left,
109 right,

~\anaconda3\envs\reggie_workspace\lib\site-packages\pandas\core\reshape\merge.py in init(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate)
698 self.right_join_keys,
699 self.join_names,
→ 700 ) = self._get_merge_keys()
701
702 # validate the merge keys dtypes. We may need to coerce

~\anaconda3\envs\reggie_workspace\lib\site-packages\pandas\core\reshape\merge.py in _get_merge_keys(self)
1090 if not is_rkey(rk):
1091 if rk is not None:
→ 1092 right_keys.append(right._get_label_or_level_values(rk))
1093 else:
1094 # work-around for merge_asof(right_index=True)

~\anaconda3\envs\reggie_workspace\lib\site-packages\pandas\core\generic.py in _get_label_or_level_values(self, key, axis)
1777 values = self.axes[axis].get_level_values(key)._values
1778 else:
→ 1779 raise KeyError(key)
1780
1781 # Check for duplicates

KeyError: ‘PartName’

1 Like

Am trying to share the excel file but its failing. Its not the right format

Try again, should work now. By the way looks like you only pasted the stack trace. Where’s the .py code?

EDIT:

Never mind, I see it now. Would help if you formatted your code.

Ok. I was looking for a way to that :grinning:

Also, you need to upload the Excel sheet. It’s important to note that I’ll only have time to write a script during the weekend, if that’d be OK.

Please upload it here on OF.

Ok. But am still getting an error message for .xlsx

No problem. Will appreciate the knowledge nonetheless

Cool. So expect a response around 19-20.12. You still haven’t uploaded the Excel sheet. :wink:

Still not working oo…

test data.xlsx (2.5 MB)
Excellent…Uploaded now

Nice, I’ll have a look at this on Sunday. :ok_hand:t5:

1 Like

Any good news on the progress you have made?