Jump to content

Powershell - Compare 1 column from two CSV's

Fallen Soul
Go to solution Solved by Fallen Soul,

So according to the post i had posted on Stack Exchange, i need to state which properties to compare. 
https://stackoverflow.com/questions/67051420/powershell-compare-two-files-not-showing-correct-data

 

Compare-Object -ReferenceObject $term1 -DifferenceObject $term2 -Property 'parent ID','Folder Name','Folder Code 1' |  Where-Object {$_.SideIndicator -eq '=>'}

 

While working on this further, i discovered that there was some trailing white space in the Folder code.

 

Update: 

While finalising this job I discovered, that using all columns in the -property option, is not the correct  method. This will compare all data which is not the best option. Well at least for this task. What I needed to do was select one column and then use the -PassThru switch after. 

 

So finally part of the code will look like this:

Compare-Object -ReferenceObject $term1 -DifferenceObject $term2 -Property 'parent ID'  |  Where-Object {$_.SideIndicator -eq '=>'}

 

Hi, 

I am trying to work out how to do this, I have two csv files and need to compare the data from column 1. 

My Current Code is as followed: 

$mycc = Import-csv -Path 'C:\temp\New Classes\MyCC-classes.csv' 
$tass = Import-Csv -Path 'C:\temp\New Classes\TASS-classes.csv'


$newClass = Compare-Object -IncludeEqual -ReferenceObject $mycc.name -DifferenceObject $tass.name | Where-Object {$_.SideIndicator -eq '=>'} | Select-Object -ExpandProperty InputObject


I have worked out checking the one column but i need the data that comes back to show both columns in the output. 
 

 

Also i have tried comparing the two csv's without defining the column but it seems to not compare correctly and i am unsure as to why. 

 

 

If i have missed anything , please let me know. 

 

Any help on this would be good and appreciated. 

 

Thanks,

Link to comment
Share on other sites

Link to post
Share on other sites

37 minutes ago, Franck said:

are you forced to use powershell ?

Nope. Eventually this task will be automated. But for now it's just so I can see what new classes have been setup, so I can create the new class pages. 

 

While I had been waiting on this thread I have had written more code that queries the MySQL DB and the MSSQL DB instead of using CSV exports. So powershell would be preferred. 

Link to comment
Share on other sites

Link to post
Share on other sites

Well if you know any full feature language that is very easy, maybe 10-15 lines of code tops. Again it depend on the compare you want. Does both CSV have same amount of records ? are they compared row 1 vs row 1 ? Do you need to sort beforehand ?

Link to comment
Share on other sites

Link to post
Share on other sites

Quote

Does both CSV have same amount of records ?

No, they will have different amount of records but same header names. (Parent ID, Folder Name, Folder Code 1, Folder Code 2, Folder Code 3, YearLevel

 

However the only columns that have data is the first three. But I only need to compare the data under Folder Name. But when the data is different to output all column data. This will allow me to upload the CSV with the new classes and there for will create the pages in the right areas etc. 

 

Quote

Do you need to sort beforehand ?

I would say sorting before hand might be a good idea as the sort by function in MySQL vs MSSQL seems too not sort the same, even though I have asked both to ORDER BY 'Folder Code 1'

Link to comment
Share on other sites

Link to post
Share on other sites

So if file 1 contains 5  rows and file 2 contains 7 rows do you need to filter and find the related row in the second file or is row 5 in file 1 comparing directly to row 5 in file 2 ?

Link to comment
Share on other sites

Link to post
Share on other sites

13 minutes ago, Franck said:

So if file 1 contains 5  rows and file 2 contains 7 rows do you need to filter and find the related row in the second file or is row 5 in file 1 comparing directly to row 5 in file 2 ?

I just need to find what rows that are in file 2 that are not in file 1. 

 

So basically file 1 is a export of all the current class pages that are already in the LMS system. File 2 is the extraction of the classes that are time tabled. 

 

Link to comment
Share on other sites

Link to post
Share on other sites

18 minutes ago, abass44 said:

I would say sorting before hand might be a good idea as the sort by function in MySQL vs MSSQL seems too not sort the same, even though I have asked both to ORDER BY 'Folder Code 1'

No order by unless totally constrained will never ever return you the same data. Totally constrain mean that if with the small about of order by  column you pick there are still 2 record that get the same matching index they order between them will be random. Also typically the order remain on equality the fastest record to be fetch will show before the other. This will change for each different hard drive your data is stored on. If in RAM tables it will be the lowest memory pointer first

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, abass44 said:

just need to find what rows that are in file 2 that are not in file 1. 

So the column Folder Name does not have duplicates. If the value is ABC then there is no other row with ABC ( in the same file) ?

Link to comment
Share on other sites

Link to post
Share on other sites

9 minutes ago, Franck said:

So the column Folder Name does not have duplicates. If the value is ABC then there is no other row with ABC ( in the same file) ?

Yes that is correct. There should be no duplicates in the same folder.

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, abass44 said:

Yes that is correct. There should be no duplicates. 

then it's simple you can if you don't have it download Microsoft Visual Studio Community and create a C# console application and use the following code.

 

 static void Main(string[] args)
        {
            // read 3 parameters, it assume the order of csv 1, csv 2 and output file
            var filePath1 = args[0]; // or if you simply want to run the code in visual studio   var filePath1 = "c:\\folder\\file1.csv"
            var filePath2 = args[1]; // or if you simply want to run the code in visual studio   var filePath2 = "c:\\folder\\file2.csv"
            var outputFile = args[2]; // or if you simply want to run the code in visual studio   var outputFile = "c:\\folder\\output.csv"

            var columnIndexToCheck = 1; // put the index of the column to compare here. first column is 0, second 1...

            // load all lines of both csv and extrat right away the column you will compare
            var file1Lines = System.IO.File.ReadAllLines(filePath1).Select(l => new { CompareColumn = l.Split(new[] { "," }, StringSplitOptions.None)[columnIndexToCheck], Line = l }).ToList();
            var file2Lines = System.IO.File.ReadAllLines(filePath2).Select(l => new { CompareColumn = l.Split(new[] { "," }, StringSplitOptions.None)[columnIndexToCheck], Line = l }).ToList();

            // will contain the lines to output
            var output = new List<string>();

            // for each lines in csv #2
            foreach (var file2Line in file2Lines)
            {
                // if we do not find a matching file in csv 1 that has the same value we put the file 2 line to be in the output result file
                if (!file1Lines.Any(f1 => f1.CompareColumn == file2Line.CompareColumn))
                {
                    output.Add(file2Line.Line);
                }
            }

            // write the output file
            System.IO.File.WriteAllLines(outputFile, output);

            // show a message saying it's done
            Console.WriteLine("completed");
        } 

 

then you build that exe and you can use command line it for automation

 

TheExecutableFile.exe "c:\csv1.csv" "c:\csv2.csv" "c:\output.csv" 

 

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, Franck said:

then it's simple you can if you don't have it download Microsoft Visual Studio Community and create a C# console application and use the following code.

 


 static void Main(string[] args)
        {
            // read 3 parameters, it assume the order of csv 1, csv 2 and output file
            var filePath1 = args[0]; // or if you simply want to run the code in visual studio   var filePath1 = "c:\\folder\\file1.csv"
            var filePath2 = args[1]; // or if you simply want to run the code in visual studio   var filePath2 = "c:\\folder\\file2.csv"
            var outputFile = args[2]; // or if you simply want to run the code in visual studio   var outputFile = "c:\\folder\\output.csv"

            var columnIndexToCheck = 1; // put the index of the column to compare here. first column is 0, second 1...

            // load all lines of both csv and extrat right away the column you will compare
            var file1Lines = System.IO.File.ReadAllLines(filePath1).Select(l => new { CompareColumn = l.Split(new[] { "," }, StringSplitOptions.None)[columnIndexToCheck], Line = l }).ToList();
            var file2Lines = System.IO.File.ReadAllLines(filePath2).Select(l => new { CompareColumn = l.Split(new[] { "," }, StringSplitOptions.None)[columnIndexToCheck], Line = l }).ToList();

            // will contain the lines to output
            var output = new List<string>();

            // for each lines in csv #2
            foreach (var file2Line in file2Lines)
            {
                // if we do not find a matching file in csv 1 that has the same value we put the file 2 line to be in the output result file
                if (!file1Lines.Any(f1 => f1.CompareColumn == file2Line.CompareColumn))
                {
                    output.Add(file2Line.Line);
                }
            }

            // write the output file
            System.IO.File.WriteAllLines(outputFile, output);

            // show a message saying it's done
            Console.WriteLine("completed");
        } 

 

then you build that exe and you can use command line it for automation

 


TheExecutableFile.exe "c:\csv1.csv" "c:\csv2.csv" "c:\output.csv" 

 

Thanks. I will have to give this a go when I am back at work on Monday. 

 

However, would you happen to know how to do what I am wanting in powershell as well? 

Link to comment
Share on other sites

Link to post
Share on other sites

35 minutes ago, abass44 said:

However, would you happen to know how to do what I am wanting in powershell as well? 

unfortunately no. I know tons of fully features language so powershell is a waste of time as it's way longer to code something and way to limited to waste time for me. I use it for small batch simply commands but even there it's usually less line of code to write in C++ / Java or C#.

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, Franck said:

unfortunately no. I know tons of fully features language so powershell is a waste of time as it's way longer to code something and way to limited to waste time for me. I use it for small batch simply commands but even there it's usually less line of code to write in C++ / Java or C#.

Okay thanks. Maybe someone else out there might have an idea. But I will try your option Monday. 

Link to comment
Share on other sites

Link to post
Share on other sites

So according to the post i had posted on Stack Exchange, i need to state which properties to compare. 
https://stackoverflow.com/questions/67051420/powershell-compare-two-files-not-showing-correct-data

 

Compare-Object -ReferenceObject $term1 -DifferenceObject $term2 -Property 'parent ID','Folder Name','Folder Code 1' |  Where-Object {$_.SideIndicator -eq '=>'}

 

While working on this further, i discovered that there was some trailing white space in the Folder code.

 

Update: 

While finalising this job I discovered, that using all columns in the -property option, is not the correct  method. This will compare all data which is not the best option. Well at least for this task. What I needed to do was select one column and then use the -PassThru switch after. 

 

So finally part of the code will look like this:

Compare-Object -ReferenceObject $term1 -DifferenceObject $term2 -Property 'parent ID'  |  Where-Object {$_.SideIndicator -eq '=>'}

 

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×