INDEX
Explanations
names of individuals or entities
proper nouns, particularly names of people
New Auto-Interp
Negative Logits
addons
-0.74
respectively
-0.73
izoph
-0.65
thora
-0.63
Tokens
-0.63
notation
-0.62
overlap
-0.60
overwhelming
-0.60
depending
-0.59
20439
-0.58
POSITIVE LOGITS
remembers
1.07
celebrates
1.04
writes
1.01
teaches
0.98
poses
0.97
joins
0.97
Profile
0.96
speaks
0.94
greets
0.93
wears
0.92
Activations Density 0.233%