INDEX
    Explanations

    references to countries and regions, particularly focusing on mentions of Croatia and Yugoslavia

    references to Croatia and Yugoslavia

    New Auto-Interp
    Negative Logits
    ellation
    -0.80
    lying
    -0.76
    VALUE
    -0.72
    reads
    -0.70
    ENTS
    -0.70
    WARD
    -0.68
     certs
    -0.68
    atives
    -0.67
    ATA
    -0.65
    NING
    -0.64
    POSITIVE LOGITS
     Croatia
    1.11
    Äį
    0.92
    Äĩ
    0.90
    oslov
    0.90
     Herz
    0.89
     Ukrain
    0.83
    ovi
    0.82
     Croatian
    0.81
     Croat
    0.81
    Cro
    0.80
    Act Density 0.009%

    No Known Activations