INDEX
Explanations
phrases indicating chaotic or violent scenarios
New Auto-Interp
Negative Logits
ëĵľë¦¬
-0.15
ëĨĢ
-0.15
nat
-0.14
conde
-0.14
ulk
-0.13
aepernick
-0.13
_nat
-0.13
Ñħов
-0.13
ÛĮÙĪØªÛĮ
-0.13
گراÙĨ
-0.12
POSITIVE LOGITS
literal
1.01
literally
1.00
Liter
0.96
liter
0.91
Literal
0.79
Liter
0.78
literal
0.76
-liter
0.73
Literal
0.71
liter
0.70
Activations Density 0.023%