INDEX
Explanations
words related to legal and criminal activities, such as "charge" and "charged"
New Auto-Interp
Negative Logits
angular
-0.65
livest
-0.64
obser
-0.64
Remastered
-0.64
illusion
-0.63
Orth
-0.63
ophe
-0.62
VIS
-0.62
patch
-0.61
atters
-0.61
POSITIVE LOGITS
heet
1.24
criminally
1.09
llah
1.01
charges
0.90
indicted
0.83
indict
0.83
illo
0.82
indictment
0.76
accused
0.74
perjury
0.74
Activations Density 0.029%