INDEX
Explanations
references to books and reading-related activities
New Auto-Interp
Negative Logits
ante
-0.18
gne
-0.15
GPC
-0.14
ickey
-0.14
antes
-0.14
Dys
-0.14
verages
-0.14
ulty
-0.14
Duffy
-0.13
oner
-0.13
POSITIVE LOGITS
elian
0.17
Nat
0.17
Nat
0.15
NAN
0.15
istine
0.14
isko
0.14
ÄŁinin
0.14
osg
0.14
igers
0.13
Troll
0.13
Activations Density 0.411%