INDEX
    Explanations

    names or references related to a specific person, possibly associated with legal or political contexts

    New Auto-Interp
    Negative Logits
     ORIG
    -0.75
     Lauder
    -0.74
     drift
    -0.65
     Atlantic
    -0.63
    ashtra
    -0.63
     Clash
    -0.62
    agher
    -0.61
     Irma
    -0.61
     Surviv
    -0.60
     Devi
    -0.60
    POSITIVE LOGITS
    lishing
    1.29
    bing
    1.23
    rious
    1.19
    lique
    1.16
    bles
    1.15
    lish
    1.13
    lisher
    1.13
    ilant
    1.08
    bed
    1.08
    bish
    1.05
    Act Density 0.024%

    No Known Activations