INDEX
Explanations
personal names or pronouns in the text
New Auto-Interp
Negative Logits
arth
-0.18
far
-0.17
ople
-0.17
aks
-0.16
aring
-0.15
isure
-0.15
opard
-0.14
lien
-0.14
apon
-0.14
699
-0.14
POSITIVE LOGITS
issing
0.17
ãĥĥãĤ¯
0.17
ibel
0.16
ombs
0.15
lesen
0.15
jc
0.15
bout
0.15
ibling
0.15
MetroFramework
0.15
ukes
0.15
Activations Density 0.066%