Posted by CANbike on Wed, 13 Nov 2013

[Bash, sed] Delete HTML Tags, Files, Subdirectories, etc

The following is a brief list of frequently used commands for writing Bash scripts. It is intended as a quick reference guide. In addition, for file and string variable manipulation, some sed commands are listed.

For complete documentation see the online Bash manual and the online sed manual.

Bash:Arithmetic Operators (Integers Only)
Bash:Boolean Operators
Bash:Floating Point Operators
Bash:Integer Comparison Operators
Bash:String Comparison Operators
Bash Script:Declaration
Bash Script:Execute Another Shell Script
Bash Script:Multiple Command Line Arguments
Conditionals & Loops:Brackets
Conditionals & Loops:If Then
Conditionals & Loops:If Then Else
Conditionals & Loops:If Then Else If Else
Conditionals & Loops:While Loop
File:Add to the End of a Line
File:Delete Empty Lines
File:Delete HTML Tags
File:Delete HTML Tag and Inner Text
File:Delete Lines Not Containing Keyword
File:Extract Portion of Line After Keyword
File:For Each File in Folder
File:For Each File in Subdirectory of Folders
File:Insert at the Beginning of a Line
File:Search and Replace
Input/Output:File Concatenation
Input/Output:File Overwriting
Input/Output:Read a File Line By Line
Other Commands:Date & Time
Other Commands:Download Web Page
Variable:Convert to Lowercase
Variable:Convert to Uppercase
Variable:Search & Replace
Variable:Substring Extraction
Documentation:sed


[Bash] Arithmetic Operators (Integers Only)

+    Plus
-    Minus
*    Multiplication
/    Division
**   Exponentiation
%    Modulo (remainder of an integer division)
+=   Plus-equal (increment variable by a constant)
-=   Minus-equal (decrement variable by a constant)
*=   Times-equal (multiply variable by a constant)
/=   Slash-equal (divide variable by a constant)
%=   mod-equal (remainder of dividing variable by a constant)

usage with let command or within double parenthesis:

VARIABLE=$(( 5 + 3 ))
let "VARIABLE += 1"


[Bash] Boolean Operators

!    NOT
&&   AND
||   OR

Example:

if [ $condition1 ] || [ $condition2 ]
then
    ...
fi


[Bash] Floating Point Operators

Bash only has integer operators. Use command bc for floating points.

variable=$(echo "OPTIONS; OPERATIONS" | bc)

For example:

DY=$(echo "scale=3;100*$VAR1/$VAR2" | bc -l)


[Bash] Integer Comparison Operators

-eq   Equal to
-ne   Not equal to
-gt   Greater than
-ge   Greater than or equal to
-lt   Less than
-le   Less than or equal to
<     Less than (within double parentheses)
<=    Less than or equal to (within double parentheses)
>     Greater than (within double parentheses)
>=    Greater than or equal to (within double parentheses)


[Bash] String Comparison Operators

=    Equal to
==   Equal to
!=   Not equal to
<    Less than, in ASCII alphabetical order
>    Greater than, in ASCII alphabetical order
-z   String is null
-n   String is not null


[Bash Script] Bash Declaration

#!/bin/sh


[Bash Script] Execute Another Shell Script

sh filename.sh


[Bash Script] Multiple Command Line Arguments

$1 is the first command line argument.
$@ is the array of arguments.

template.sh

#!/bin/sh

function doStuff(){
    ...
}

if [ -z "$1" ]
then
    echo
    echo "usage: doStuff [arguments]"
    echo "Separate multiple arguments with a space."
    echo ""
else
    for var in "$@"
    do		
        if [ -n "$var" ]
        then
            doStuff
	fi
    done
fi


[Conditionals & Loops] Brackets

[]      Test expression between [ ] using the shell builtin "test" for file types and strings.
[[ ]]   Test expression between [[ ]]. Shell keyword more versatile than "[ ]". Extended "test" command.


[Conditionals & Loops] If Then

if [ conditional expression ]
then
    ...
fi


[Conditionals & Loops] If Then Else

if [ conditional expression ]
then
    ...
else
    ...
fi


[Conditionals & Loops] If Then Else If Else

if [ conditional expression 1 ]
then
    ...
elif [ conditional expression 2 ]
then
    ...
elif [ conditional expression 3 ]
then
    ...
else
    ...
fi


[Conditionals & Loops] While Loop

while [ conditional expression ]; do
    ...
done


[File] Add to the End of a Line

sed -i "s/$/ADDTHIS/g" filename.txt


[File] Delete Empty Lines

sed -i "s/\r//g" out.txt
sed -i '/^$/d' out.txt


[File] Delete HTML Tags

sed -i "s/<[^>]*>//g" filename.txt


[File] Delete HTML Tag and Inner Text

e.g. script tag

sed -i "s/<script.+?(?=script>)script>//g" filename.txt


[File] Delete Lines Not Containing Keyword

sed -i '/KEYWORD/!d' filename.txt


[File] Extract Portion of Line After Keyword

EXTRACTED=$(grep -Po 'KEYWORD.*' filename.txt | sed "s/KEYWORD//g")


[File] For Each File in Folder

for i in $FOLDER/*
do
    if [[ -f "$i" ]]
    then
       ...
    fi
done


[File] For Each File in Subdirectory of Folders

for j in $(find $FOLDER -type d )
do
    for i in "$j"/*
    do
        if [[ -f "$i" ]]
        then
            ...
        fi
    done
done


[File] Insert at the Beginning of a Line

sed -i "s/^/INSERTTHIS/g" filename.txt


sed -i "s/FIND/REPLACE/g" out.txt


[Input/Output] File Concatenation

cat "file1.txt" >> "file2.txt"


[Input/Output] File Overwriting

cat "file1.txt" > "file2.txt"


[Input/Output] Read a File Line By Line

DONE=false
until $DONE ;do
read LINE || DONE=true
    echo $LINE
done <filename.txt


[Other Commands] Date & Time

DATETIME=$(date)


[Other Commands] Download Web Page

wget $URL -O filename.txt -nv &> /dev/null


[Variable] Convert to Lowercase

SAMPLESTRING=$(echo $SAMPLESTRING | tr "[A-Z]" "[a-z]")


[Variable] Convert to Uppercase

SAMPLESTRING=$(echo $SAMPLESTRING | tr "[a-z]" "[A-Z]")


[Variable] Search & Replace

SAMPLESTRING=$(echo $SAMPLESTRING | sed -e 's/FIND/REPLACE/g')


[Variable] Substring Extraction

${string:position:length}

For Example

MYSTRING="Sample text"
MYSUBSTRING=${$MYSTRING:7:4}


[Documentation] sed

sed OPTIONS [SCRIPT] [INPUTFILE]

Options
=======
    -e    Script.
    -i    Edit files in place.

Address
=======
    addr1,addr2  Address range is separated by a comma.
    number       Line Number.
    first~step   i.e. '2~3'. Every third line (_step_) starting with the second line (_first_).
    $            Last line.
    /regexp/     Lines matching the regular expression regexp.
    addr1,+N     Matches addr1 and the N lines following addr1. 
    addr1,~N     Match addr1 and the lines following addr1 until the next line whose input line number is a multiple of N.
    !            Not flag. Appending to the end of an address, negates it.

Commands Which Accept Address
=============================

    [address]/command

    d    Delete
    p    Print
    s    Substitution

s Command
=========
    s/regexp/replacement/flags

    Flags
    -----
        g        Apply to all matches
        number   Replace the numberth match of the regexp. 
        p        Print the new pattern
        i        Case-insensitive regexp match

Related Item(s):