INDEX
Explanations
proper nouns, particularly names of authors and works
New Auto-Interp
Negative Logits
cury
-0.16
warp
-0.15
played
-0.15
ustil
-0.15
podp
-0.14
partment
-0.13
051
-0.13
occo
-0.13
amina
-0.13
igest
-0.13
POSITIVE LOGITS
Writes
0.15
Author
0.14
writes
0.14
olkien
0.14
åº
0.14
ãĥ«ãĥĪ
0.14
onBind
0.13
æ´¾
0.13
Zus
0.13
ÙĪÛĮÛĮ
0.13
Activations Density 0.288%