INDEX
    Explanations

    terms related to Western culture and influence

    New Auto-Interp
    Negative Logits
    TO
    -0.17
    stown
    -0.16
    ocab
    -0.16
    plete
    -0.15
    amma
    -0.14
    efon
    -0.14
    FromClass
    -0.14
    uld
    -0.14
    odore
    -0.14
    utr
    -0.14
    POSITIVE LOGITS
    ern
    0.35
    ward
    0.31
    ERN
    0.30
    ers
    0.30
    erner
    0.28
    most
    0.28
    s
    0.26
    eners
    0.24
    /right
    0.23
     ern
    0.23
    Act Density 0.038%

    No Known Activations