INDEX
Explanations
mentions of conflicts or war-related events
New Auto-Interp
Negative Logits
çīĪ
-0.68
*/(
-0.66
uez
-0.64
Vec
-0.63
HCR
-0.62
Dinner
-0.61
supra
-0.61
Square
-0.61
Manga
-0.60
Philos
-0.59
POSITIVE LOGITS
ridden
1.44
induced
1.28
related
1.25
fighting
1.24
resistant
1.19
inducing
1.19
filled
1.19
prone
1.17
laden
1.15
torn
1.14
Activations Density 0.064%