INDEX
    Explanations

    terms related to national identity or national themes

    New Auto-Interp
    Negative Logits
    /he
    -0.17
    anuts
    -0.16
    nice
    -0.16
     nice
    -0.15
     Nice
    -0.15
    Nice
    -0.15
    orgh
    -0.15
    äºĭæĥħ
    -0.14
    inand
    -0.14
    elda
    -0.14
    POSITIVE LOGITS
    /local
    0.21
    /reg
    0.19
    /global
    0.18
    LEGRO
    0.17
    /world
    0.17
    ized
    0.17
    izing
    0.16
    wide
    0.16
    nap
    0.15
    istic
    0.15
    Act Density 0.050%

    No Known Activations