Listing Icelandic given names

Jan 02, 2025 16:51



Yesterday's entry on given names in Hrím had me looking at the official register of given names. There's no complete list available, and since I wanted one in a useful format (such as CSV) I decided to put it together myself. Here's how.



  1. Start by getting the list of names for each initial letter by applying filters and saving the page with the table in your browser. In Firefox at least, make sure you save it as Web page, complete rather than Web page, HTML only, since the latter will save the original page, not the current page with dynamically-loaded elements.



  2. Put all the files in a subdirectory, say orig, and run the following script:

    #!/usr/bin/perl

    use strict;
    use feature qw/say/;
    use File::Slurp qw/read_file write_file append_file/;

    my @files = ;

    write_file("merged.htm", "");

    for my $filename (@files) {
    say "Processing $filename ...";

    my $file = read_file($filename);

    $file =~ s/^.*//gs;
    $file =~ s#.*$##gs;
    $file =~ s/ class=".*?"//gs;

    my $checkmarksvg = q|

    |;
    my $infosvg = q|

    |;
    my $opensvg = q|
    |;
    my $crossmarksvg = q|

    |;
    $file =~ s/$checkmarksvg/samþykkt/g;
    $file =~ s/$infosvg/(i)/g;
    $file =~ s/$opensvg//g;
    $file =~ s/$crossmarksvg/hafnað/g;

    $file =~ s/ target="_blank" rel="noopener noreferrer" tabindex="-1"//g;
    $file =~ s###g;
    $file =~ s###g;
    $file =~ s###g;
    $file =~ s##\n#g;

    (my $outfilename = $filename) =~ s#orig/##;

    my @lines = split /^/, $file;

    append_file("merged.htm", (@lines, "\n"));
    @lines = ("\n", @lines, "\n\n");
    write_file($outfilename, @lines);

    }

    This'll write a new file, merged.htm.



  3. Run the command cat merged.htm | sort | uniq >merged2.htm to filter out duplicate entries (probably not necessary).



  4. Run the command cat merged2.htm | perl processit.pl, using the following script:

    #!/usr/bin/perl

    use strict;
    use utf8;
    use open qw(:std :utf8);
    use feature qw/say/;
    use Data::Dumper;
    use List::Util qw/uniq/;
    use Text::CSV;

    my $nöfn = {};

    while(<>) {

    chomp;
    s/\r//g;
    s/\n//g;

    s///;
    s###;
    s/Úrskurður\s*//gi;

    my @dót = split m##, $_;

    my $kyn = $dót[0];
    my $ritbr = 0;
    if($kyn =~ /\s*\(ritbr.\)$/) {
    $kyn =~ s/\s*\(ritbr.\)$//;
    $ritbr = 1;
    }

    #say join "\n", @dót;

    my $nafn = $dót[1];
    my $upplýsingar = 0;
    if($nafn =~ /\(i\)/) {
    $nafn =~ s/\(i\)//;
    $upplýsingar = 1;
    }

    my $ákvörðun = $dót[2];
    my $dagsetning = "";
    my $tengill = "";
    my @athugasemdir = ();
    $ákvörðun =~ s#$##;
    $ákvörðun =~ s/\s*$//;
    $ákvörðun =~ s/^\s*//;

    if($ákvörðun =~ /\s*-\s*sératkvæ/) {
    $ákvörðun =~ s/\s*-\s*sératkvæ//;
    push @athugasemdir, "sératkvæði";
    }

    if($ákvörðun =~ / /) {
    $ákvörðun =~ s/
    //;
    $tengill = $1;
    $tengill =~ s/amp;//g;
    }
    my $dagsetningar_re = '(\d{1,2}\.\d{1,2}\.(\d{2,})(\s*-\s*\d{1,2}\.\d{1,2}\.(\d{2,}))?|\d{1,2}\. (janúar|febrúar|mars|april|maí|júní|júlí|ágúst|september|október|nóvember|desember) (\d{2,})|\d{2}/\d{4}|\d{2}.\d{6}|\d{4})';
    if($ákvörðun =~ /$dagsetningar_re/) {
    $ákvörðun =~ s/$dagsetningar_re//;
    $dagsetning = $1;
    }

    my $athugasemdir = join ", ", @athugasemdir;
    push @{ $nöfn->{$nafn} }, {
    "Kyn" => $kyn,
    "Ritháttur" => $ritbr,
    "Upplýsingar" => $upplýsingar,
    "Ákvörðun" => $ákvörðun,
    "Dagsetning" => $dagsetning,
    "Tengill" => $tengill,
    "Athugasemdir" => $athugasemdir,
    };

    # say Dumper $nöfn;
    # last

    }

    #my @x = map { $_->{"Ákvörðun"} } values %$nöfn;
    #say join "\n", uniq(@x)

    # my @x = grep { ($nöfn->{$_}->{"Ákvörðun"} ne "✓") and ($nöfn->{$_}->{"Ákvörðun"} ne "❌") and ($nöfn->{$_}->{"Ákvörðun"} ne "") } keys %$nöfn;
    # say join "\n", @x;

    # say Dumper $nöfn;

    my @atriði = qw/Kyn Ritháttur Upplýsingar Ákvörðun Dagsetning Athugasemdir Tengill/;

    my $kag = Text::CSV->new({ binary => 1, eol => "\n" });
    open my $FH, ">", "nöfn.csv" or die "Ekki tókst að opna nöfn.csv: $!";
    open my $HTML, ">", "nöfn.html" or die "Ekki tókst að opna nöfn.html: $!";

    $kag->print($FH, ['key', @atriði]);
    say $HTML "Nafn", join("", @atriði), "";

    my @icelandic_order = qw(A Á B C D Ð E É F G H I Í J K L M N O Ó P Q R S T U Ú V W X Y Ý Z Þ Æ Ö);
    my %rank = map { $icelandic_order[$_] => $_ } 0 .. $#icelandic_order;

    sub compare_keys {

    my @a_chars = split //, uc($a);
    my @b_chars = split //, uc($b);

    my $len_a = scalar @a_chars;
    my $len_b = scalar @b_chars;

    for my $i (0 .. $len_a - 1) {
    last if $i >= $len_b;

    my $rank_a = $rank{$a_chars[$i]} // 999;
    my $rank_b = $rank{$b_chars[$i]} // 999;

    return $rank_a <=> $rank_b if $rank_a != $rank_b;
    }

    return $len_a <=> $len_b;
    }

    for my $nafn (sort compare_keys keys %$nöfn) {
    for my $færsla (@{ $nöfn->{$nafn} }) {
    my @röð = ($nafn, map { $færsla->{$_} // '' } @atriði);
    $kag->print($FH, \@röð);

    if($færsla->{"Tengill"} ne "") {
    $færsla->{"Tengill"} = '
    Smelltu hér';
    }
    @röð = ($nafn, map { $færsla->{$_} // '' } @atriði);
    say $HTML "", join("", @röð), "";
    }
    }

    say $HTML "";

    close $FH;
    close $HTML;

    This'll give you two files: nöfn.csv, and nöfn.html. The former is suitable for further processing, the latter provides an unstyled overview over all names.



What I haven't done yet is integrating the annotations for names that the original pages also have. Right now, there is only a flag indicating that there is an annotation, but obviously in order to process the whole thing further you'd ideally wand the text of the annotation itself. Also, dates aren't normalized to a common, sensible format yet.

The list of names reveals various curiosities, BTW. Quite a few names were apparently applied for several times until they were finally permitted, and I feel not all of them integrate well into the Icelandic language (which is to say I'm not sure would have voted to permit them all). Others were denied for no reason I can readily discern (such as Aðalbjörgvin). Still others are funny but were probably rightly denied, such as the seasonally appropriate Eldflaug (rocket) for a girl. And then there's names that made me think that a child shouldn't be made to bear them just because the parents thought they had a funny idea, such as Kóbra.

icelandic names, names, icelandic, programming

Previous post
Up