INDEX
Explanations
a set of somewhat random terms but with a slight bias towards strong opinions and terms of degree, as well as discussion of storytelling and race.
New Auto-Interp
Negative Logits
NUMX
-0.60
phite
-0.60
ngua
-0.60
estyles
-0.60
AndroidJUnit
-0.59
TextAlign
-0.59
сылкі
-0.57
aktery
-0.54
saraba
-0.54
cloudflare
-0.54
POSITIVE LOGITS
Monfieur
0.63
myſelf
0.55
Jefus
0.54
themſelves
0.52
itſelf
0.51
himſelf
0.50
Conſ
0.50
ſche
0.49
ſeveral
0.49
TagMode
0.48
Activations Density 0.266%