INDEX
Explanations
phrases related to social issues or commentary
instances of the word "the."
New Auto-Interp
Negative Logits
thood
-0.72
eatures
-0.69
Alternatively
-0.68
nesty
-0.65
OSH
-0.64
Site
-0.64
è£ıè
-0.64
ason
-0.64
aken
-0.63
besides
-0.63
POSITIVE LOGITS
rest
1.00
slightest
1.00
ses
0.98
smallest
0.94
entirety
0.88
hardest
0.87
whole
0.87
brightest
0.87
vast
0.86
heaviest
0.86
Activations Density 0.196%