INDEX
    Explanations

    references to suffering or experiencing negative conditions

    New Auto-Interp
    Negative Logits
    eb
    -0.17
    ehler
    -0.16
    asant
    -0.16
    il
    -0.16
    evi
    -0.15
    oria
    -0.15
     vivo
    -0.15
    erna
    -0.15
    ardon
    -0.15
    cheng
    -0.15
    POSITIVE LOGITS
    IDA
    0.19
    zeug
    0.16
    zcze
    0.15
    proof
    0.15
    instein
    0.14
    deaux
    0.14
    icial
    0.14
    ityEngine
    0.14
    ëį
    0.14
    edReader
    0.14
    Act Density 0.027%

    No Known Activations