INDEX
Explanations
references to authority figures or individuals in leadership roles
New Auto-Interp
Negative Logits
enegger
-0.69
ELL
-0.66
anus
-0.66
sung
-0.65
orph
-0.65
Sold
-0.64
æĺ
-0.63
gnu
-0.61
Boo
-0.61
milo
-0.60
POSITIVE LOGITS
overseeing
1.06
boards
0.86
hyde
0.80
of
0.79
eering
0.73
oversee
0.71
taker
0.69
Reviewer
0.68
thereof
0.67
helm
0.67
Activations Density 0.009%