INDEX
Explanations
references to individuals with a title of "Sir"
titles or honorifics associated with notable individuals
New Auto-Interp
Negative Logits
graft
-0.73
endas
-0.66
packing
-0.65
itol
-0.65
revolving
-0.65
HOU
-0.63
wom
-0.63
closest
-0.63
combust
-0.61
endars
-0.61
POSITIVE LOGITS
zech
1.04
dinand
0.83
Isaac
0.83
Arthur
0.80
Cyr
0.80
Malcolm
0.79
Winston
0.79
Ian
0.78
ius
0.77
Laun
0.76
Activations Density 0.016%