INDEX
Explanations
mentions of people or names, specifically those starting with the letter "D"
New Auto-Interp
Negative Logits
ump
-0.21
ock
-0.20
ocs
-0.17
ays
-0.17
iesel
-0.17
ocking
-0.16
ATA
-0.16
raw
-0.16
uner
-0.16
agger
-0.15
POSITIVE LOGITS
arry
0.22
eric
0.21
aron
0.21
omen
0.20
anel
0.20
yll
0.19
arr
0.19
erval
0.18
IMIT
0.18
hire
0.18
Activations Density 0.028%