Extract words from stardict file to csv

Oct 28, 2024 16:31


1) Download stardict files:
git clone https://github.com/freedict/fd-dictionaries.git

2) Download python package to read stardict files:
git clone https://github.com/ilius/pyglossary.git
cd pyglossary/
cp /home/ubuntu/fd-dictionaries/eng-hin/eng-hin.tei .
python3 main.py

# convert eng-hin.tei file to out.txt

Select the first 3 columns:

awk '{
gsub(/<[^>]*>/, "|");

gsub(/\|+/, "|");

match($0, /([^|]*\|){3}/);
first_three = substr($0, RSTART, RLENGTH);
print first_three
}' out.txt > test.csv
Previous post Next post
Up