INDEX
Explanations
names of historical figures
New Auto-Interp
Negative Logits
bothering
-0.72
higher
-0.66
Untitled
-0.65
Ĥ¬
-0.63
pregn
-0.62
thinking
-0.61
intent
-0.60
veyard
-0.60
necessary
-0.59
giving
-0.59
POSITIVE LOGITS
'll
1.13
underwent
1.02
participated
1.01
graduated
1.01
'd
1.00
pherd
1.00
oversaw
1.00
survived
0.99
earns
0.96
scored
0.96
Activations Density 0.355%