INDEX
Explanations
names or instances of references within a text
references to names or naming
New Auto-Interp
Negative Logits
elaide
-0.77
yrinth
-0.76
istar
-0.74
psey
-0.74
romy
-0.70
isexual
-0.68
icult
-0.67
berra
-0.67
Smy
-0.67
iership
-0.64
POSITIVE LOGITS
plates
1.58
plate
1.48
paces
1.20
paced
1.01
names
0.95
ames
0.91
calling
0.91
recognition
0.90
brand
0.87
akes
0.84
Activations Density 0.046%