INDEX
    Explanations

    expressions of fear and risk related to personal safety

    New Auto-Interp
    Negative Logits
    ehler
    -0.16
    ĮĴ
    -0.15
     culprit
    -0.15
    خش
    -0.14
    Įĵ
    -0.14
    Ậ
    -0.14
     guilty
    -0.14
    insi
    -0.13
    affen
    -0.13
     Regards
    -0.13
    POSITIVE LOGITS
     being
    0.40
    being
    0.34
     Being
    0.29
    Being
    0.29
     becoming
    0.29
     scrutiny
    0.28
    被
    0.27
     detection
    0.25
     losing
    0.24
     attack
    0.23
    Act Density 0.227%

    No Known Activations