INDEX
    Explanations

    phrases indicating chaotic or violent scenarios

    New Auto-Interp
    Negative Logits
    ëĵľë¦¬
    -0.15
     ëĨĢ
    -0.15
    nat
    -0.14
    conde
    -0.14
    ulk
    -0.13
    aepernick
    -0.13
    _nat
    -0.13
    Ñħов
    -0.13
     ÛĮÙĪØªÛĮ
    -0.13
    گراÙĨ
    -0.12
    POSITIVE LOGITS
     literal
    1.01
     literally
    1.00
     Liter
    0.96
    liter
    0.91
     Literal
    0.79
    Liter
    0.78
    literal
    0.76
    -liter
    0.73
    Literal
    0.71
     liter
    0.70
    Act Density 0.023%

    No Known Activations