INDEX
    Explanations

    geographical locations or entities

    the term "gent" related to social or diplomatic contexts

    New Auto-Interp
    Negative Logits
    WT
    -0.77
    ADE
    -0.75
    IRO
    -0.71
    oufl
    -0.71
    esty
    -0.68
    ecause
    -0.68
    senal
    -0.68
    Downloadha
    -0.66
    qqa
    -0.65
    inki
    -0.65
    POSITIVE LOGITS
    gent
    1.49
    lus
    0.80
    rant
    0.75
    rants
    0.75
    ente
    0.74
    nesses
    0.72
    gently
    0.71
    rification
    0.70
    é¾įå¥ij士
    0.70
    wick
    0.69
    Act Density 0.006%

    No Known Activations