INDEX
Explanations
terms related to violent events, such as shootings and wars
New Auto-Interp
Negative Logits
gers
-0.70
ĪĴ
-0.69
thens
-0.69
ilk
-0.69
stal
-0.68
shire
-0.67
yssey
-0.67
BILITIES
-0.67
nos
-0.66
tis
-0.65
POSITIVE LOGITS
achusetts
1.48
achus
1.06
imil
0.86
acre
0.85
achu
0.82
transit
0.79
aging
0.79
exodus
0.78
eval
0.77
ages
0.77
Activations Density 0.671%