INDEX
Explanations
percentage-related phrases and statistics
New Auto-Interp
Negative Logits
ught
-0.15
reh
-0.14
ternal
-0.14
iÃŁ
-0.14
onom
-0.14
chter
-0.14
ours
-0.14
king
-0.14
ische
-0.14
ger
-0.13
POSITIVE LOGITS
majority
0.23
Majority
0.18
éºĹ
0.16
Sizer
0.15
licken
0.15
OSC
0.14
ppv
0.14
every
0.13
none
0.13
Barton
0.13
Activations Density 0.081%