INDEX
    Explanations

    the presence of strong emotional or impactful language

    New Auto-Interp
    Negative Logits
    andon
    -0.15
    $MESS
    -0.15
    ellar
    -0.15
    .om
    -0.14
     Omni
    -0.14
    omi
    -0.14
    ç¿Ķ
    -0.14
     Bands
    -0.14
    emo
    -0.14
    654
    -0.14
    POSITIVE LOGITS
    orte
    0.17
     nat
    0.15
    ected
    0.15
    els
    0.15
    nat
    0.15
     Baths
    0.15
    ague
    0.14
    ense
    0.14
     Nat
    0.14
    Nat
    0.14
    Act Density 0.027%

    No Known Activations