PowerShell for a Data Scientist 1

Behind The Scene

I was writing some new functions to clean up a large messy data set (10+ GB) in Python 3. I wanted to make a small subset of my initial file (~ 100 KB) to test my functions and scripts. I was on my laptop that runs a Windows 10. So I decided to use the Windows PowerShell to make my small subset.

You can achieve this with a one-liner.

Windows PowerShell
Windows PowerShell


The following one-liner does the trick.

Write-Output (Get-Content .\my_big_data.csv -totalcount 20) > my_subset.csv

The structure of the code is  as following:

Write-Output INPUT > OUTPUT.

For INPUT we have a term inside parenthesis that says (Get the Content of the file but only the first 20 lines).

Hope this helps!

PS1: What is your favorite way of sub-setting a huge file (in Linux/Unix or Windows)?

PS2: Take a look at this very nice post on Windows Powershell.