INDEX
Explanations
proper nouns or specific locations
specific names and titles, particularly associated with locations or entities
New Auto-Interp
Negative Logits
terday
-0.88
wise
-0.76
selves
-0.74
theless
-0.74
extraord
-0.70
sanity
-0.70
istg
-0.70
partName
-0.67
LAST
-0.65
Extras
-0.65
POSITIVE LOGITS
onian
1.27
ocene
1.04
ospels
0.95
opian
0.93
osphere
0.93
venth
0.91
leys
0.90
orian
0.88
waters
0.87
ilian
0.86
Activations Density 0.321%