INDEX
Explanations
phrases related to economic costs or values
occurrences of the word "the"
New Auto-Interp
Negative Logits
wisely
-0.72
aloud
-0.68
respectfully
-0.65
bg
-0.64
diligently
-0.62
consulted
-0.62
representing
-0.61
bid
-0.61
invests
-0.60
listened
-0.59
POSITIVE LOGITS
entire
1.22
slightest
1.10
entirety
1.04
wearer
0.99
whole
0.98
same
0.98
widest
0.97
majority
0.94
vast
0.94
latter
0.93
Activations Density 0.295%