INDEX
Explanations
names of films or TV shows
words related to crime and legal issues
New Auto-Interp
Negative Logits
BO
-0.61
Pace
-0.60
bound
-0.59
MP
-0.57
SM
-0.56
AW
-0.56
WR
-0.55
Arm
-0.54
START
-0.54
Factor
-0.53
POSITIVE LOGITS
theless
0.85
ciating
0.79
ilib
0.77
uration
0.77
sylvania
0.77
ukong
0.76
å§«
0.76
tenance
0.76
vous
0.75
ulty
0.73
Activations Density 0.161%