INDEX
Explanations
phrases related to criminal investigations and suspect activities
New Auto-Interp
Negative Logits
Its
-0.60
gorilla
-0.59
nutshell
-0.57
ucks
-0.54
dude
-0.53
Its
-0.53
acus
-0.52
pires
-0.52
killer
-0.51
rapist
-0.51
POSITIVE LOGITS
respectively
1.23
themselves
0.94
apiece
0.92
respective
0.91
collectively
0.76
respective
0.76
their
0.73
subsequently
0.73
their
0.71
individually
0.70
Activations Density 0.970%