INDEX
Explanations
names and surnames, potentially from various contexts
proper nouns, particularly names and places
New Auto-Interp
Negative Logits
gram
-0.61
independ
-0.60
invention
-0.58
daytime
-0.55
iliation
-0.54
distance
-0.54
novelty
-0.54
tabs
-0.53
object
-0.53
NPC
-0.53
POSITIVE LOGITS
ño
1.05
heimer
1.03
kamp
0.99
bilt
0.97
tsky
0.94
uthor
0.93
gaard
0.91
haar
0.91
hoff
0.91
baum
0.90
Activations Density 0.308%