INDEX
Explanations
phrases related to expressing love or admiration for someone
instances of letters and initials, particularly those that might signify names or titles
New Auto-Interp
Negative Logits
fragmentation
-0.70
Dickinson
-0.62
timing
-0.62
ativity
-0.61
coni
-0.60
friction
-0.60
sandbox
-0.60
poly
-0.60
phony
-0.59
aic
-0.59
POSITIVE LOGITS
ought
1.14
ained
1.13
owed
1.03
stood
1.00
OULD
1.00
ounced
0.99
aughed
0.99
overed
0.98
icipated
0.98
poses
0.97
Activations Density 0.187%