Behind The Scene
I was writing some new functions to clean up a large messy data set (10+ GB) in Python 3. I wanted to make a small subset of my initial file (~ 100 KB) to test my functions and scripts. I was on my laptop that runs a Windows 10. So I decided to use the Windows PowerShell to make my small subset.
You can achieve this with a one-liner.
The following one-liner does the trick.
Write-Output (Get-Content .\my_big_data.csv -totalcount 20) > my_subset.csv
The structure of the code is as following:
Write-Output INPUT > OUTPUT.
For INPUT we have a term inside parenthesis that says (Get the Content of the file but only the first 20 lines).
Hope this helps!
PS1: What is your favorite way of sub-setting a huge file (in Linux/Unix or Windows)?
PS2: Take a look at this very nice post on Windows Powershell.