INDEX
Explanations
words related to authority figures or positions of power
references to the article "the" in various contexts
New Auto-Interp
Negative Logits
SPONSORED
-0.88
VICE
-0.72
wine
-0.68
heses
-0.67
Course
-0.67
JUST
-0.66
SetFontSize
-0.66
pps
-0.66
estyles
-0.65
owned
-0.64
POSITIVE LOGITS
proverbial
0.92
notion
0.87
edges
0.83
weeds
0.83
heels
0.83
rug
0.83
unsuspecting
0.82
slightest
0.81
offending
0.81
enorm
0.81
Activations Density 0.470%