INDEX
Explanations
proper names or names of individuals
mentions of the name "Baron" or "Barron."
New Auto-Interp
Negative Logits
mable
-0.84
icago
-0.72
Redd
-0.69
ulatory
-0.65
DN
-0.65
reek
-0.64
ulative
-0.63
mberg
-0.63
extrem
-0.62
recourse
-0.62
POSITIVE LOGITS
Baron
1.22
esses
1.04
Mord
0.98
ess
0.95
fman
0.93
stown
0.81
Buster
0.75
lord
0.75
Kirin
0.74
yip
0.72
Activations Density 0.002%