INDEX
Explanations
phrases indicating news articles or reports, possibly related to government or official statements
instances of the word "the."
New Auto-Interp
Negative Logits
books
-0.79
resemb
-0.71
calling
-0.68
reports
-0.68
bytes
-0.67
makers
-0.67
making
-0.66
replies
-0.66
memes
-0.65
indicators
-0.65
POSITIVE LOGITS
concentrate
0.81
iling
0.80
maximize
0.78
complete
0.78
bern
0.77
rouse
0.77
ilet
0.76
cca
0.75
iler
0.74
reach
0.74
Activations Density 0.000%