INDEX
Explanations
mentions of ceasefire violations
New Auto-Interp
Negative Logits
HH
-0.80
Inv
-0.77
Widget
-0.75
Exper
-0.73
Offline
-0.72
Compan
-0.69
Stud
-0.68
Jess
-0.68
Iv
-0.68
Sym
-0.67
POSITIVE LOGITS
ray
0.77
law
0.71
offending
0.68
pot
0.68
fry
0.67
justice
0.66
dope
0.66
care
0.65
-
0.65
borne
0.64
Activations Density 0.252%