INDEX
Explanations
names mentioned in a text
occurrences of the word "name" as part of personal introductions
New Auto-Interp
Negative Logits
yrinth
-0.92
EMS
-0.92
aunders
-0.88
Js
-0.80
idth
-0.74
romy
-0.72
erg
-0.71
ourse
-0.71
BSD
-0.70
iths
-0.70
POSITIVE LOGITS
plates
1.03
plate
0.96
aliases
0.93
tag
0.87
paces
0.86
tags
0.83
surname
0.81
redacted
0.78
akes
0.77
surn
0.74
Activations Density 0.029%