sed
and grep
Some of my one-liners and probably more as time pushes on.
cat images.html | grep -o -e '\(href\)="[^"]*" [^>]*><img src=' \ | sed -n 's/href="\([^"]*\)".*/https:\1/p' \ | wget --random-wait --wait 60 --no-verbose --input-file=-
Special prize for anyone who can tell me what website I wrote this one-liner for.
Inefficient File List Creation on Cygwin
This file list is for 7z.exe
on Windows 7 and so you
will notice that I have replaced the /
with
\
's
find . -type f | sed '/.*\.class/d' | sed '/\/.git\//d' \ | sed 's|^\./||' | sed 's|/|\\|g' | sed '/^workspace-.*\.7z$/d' \ | sed '/Debug\\/d' | sed '/^\.metadata\//d' > filelist.txt
This filters out, in order:
- the
.class
files, .git
directories,- removes the leading
./
for all lines, - replaces
/
with\
, - removes
workspace-*.7z
, - removes all
Debug
directories - removes the
.metadata
directory
More could be done, and it could be more efficient but this does work
for my workspace backup. You could say I don't need it or that I should
just back up the .git
directories... you would probably be
correct but I try to keep all my important files in
workspace
and some of them, I hate to admit, are not under
version control.
Converting a mixed \r\n and \n file to \n
sed 's/\r$//'
Yes, the $
is the end of the line ;-)
Hacking the HTML for LCTHW
I am going on holiday and wanted to read through sed
.
# First, download it wget -r -l 1 -p http://c.learncodethehardway.org/book/ # Then move the html to this dir... mv c.learncodethehardway.org/book/* . rm -r c.learncodethehardway.org # Now hack up the files. ls -1 *.html | xargs -I {} sed -i -e '2,+5d' -e '36,54d' -e 's/ <!--<![endif]-->//' {} ls -1 *.html | xargs -I {} echo tac {}" | sed -e '3,+47d' > "{}.bak > run.sh ls -1 *.html | xargs -I {} echo tac {}.bak" > "{} > run2.sh . run.sh . run2.sh rm *.bak
You can now put this on your ebook reader for simple reading offline on an aircraft (or your girlfriends house, a train, a hike up a hill in Yorkshire, you get the idea).
I know I removed the copyright, it was mixed up with some google page tracking javascript, as you know this book is not my own but the property of Zed A. Shaw.
As it turned out I needed to do quite a bit more fiddling to get this
to work on my Sony eReader PRS-T1. For those that are interested I have
converted the HTML into XHTML and created an epub
file. As
this is freely available and an unfinished work I don't think Zed will
mind. Download the epub file here: lcthw.epub.
To create the .epub
file I used a java project I found on
github: automated_digital_publishing.
That project now seems to have moved to GitLab: automated_digital_publishing.