INDEX
Explanations
phrases related to news headlines or current events
occurrences of the word "the"
New Auto-Interp
Negative Logits
thood
-0.74
leeve
-0.64
iffe
-0.63
Rahul
-0.60
assume
-0.59
Yuri
-0.58
Gerard
-0.58
aba
-0.57
ALT
-0.57
suppose
-0.57
POSITIVE LOGITS
ses
1.25
same
1.16
longest
1.06
entire
1.05
fastest
1.04
slightest
1.03
latter
1.03
smallest
1.01
hardest
0.97
widest
0.97
Activations Density 0.332%