INDEX
Explanations
references to specific names or entities, particularly related to people and their surnames
New Auto-Interp
Negative Logits
ledge
-0.16
naire
-0.15
favor
-0.15
kish
-0.15
favor
-0.14
resi
-0.14
gere
-0.14
witter
-0.14
Certificate
-0.14
hoo
-0.14
POSITIVE LOGITS
egov
0.19
loom
0.19
Her
0.17
itage
0.17
ules
0.17
amient
0.16
aint
0.16
Her
0.16
ogs
0.15
encia
0.15
Activations Density 0.023%