INDEX
Explanations
headlines or titles of news articles
instances of the word "the" in various contexts
New Auto-Interp
Negative Logits
Ò
-0.80
thood
-0.72
\\\\
-0.71
akable
-0.70
omics
-0.69
@#
-0.68
perse
-0.68
����
-0.68
ãĥĺ
-0.66
¯
-0.66
POSITIVE LOGITS
implication
1.21
report
1.20
statement
1.20
article
1.13
document
1.08
spokesperson
1.08
resa
1.08
allegation
1.07
wording
1.07
odore
1.06
Activations Density 0.324%