INDEX
Explanations
references to sections or categories formatted as "under [number]"
New Auto-Interp
Negative Logits
alon
-0.17
eum
-0.16
agu
-0.16
olon
-0.15
Townsend
-0.14
ahun
-0.14
shed
-0.14
CALE
-0.14
ilder
-0.14
åij½
-0.14
POSITIVE LOGITS
437
0.15
esa
0.15
isz
0.15
asma
0.15
acio
0.14
akan
0.14
iez
0.14
spi
0.14
ascar
0.14
isclosed
0.14
Activations Density 0.029%