INDEX
    Explanations

    words related to emotional or physical suffering

    New Auto-Interp
    Negative Logits
    tual
    -0.15
    lesi
    -0.15
    ekli
    -0.14
    .ca
    -0.14
     доÑĢож
    -0.13
    TC
    -0.13
    ìļ°ë¦¬
    -0.13
     å¯
    -0.13
    ìĨĶ
    -0.13
    ereum
    -0.13
    POSITIVE LOGITS
    umbo
    0.16
    394
    0.15
    нÑıÑĤ
    0.15
    396
    0.15
    ILD
    0.15
    IBE
    0.15
     pir
    0.14
    idor
    0.14
    avax
    0.13
     IDS
    0.13
    Act Density 0.006%

    No Known Activations