INDEX
Explanations
articles in a written text
instances of the word "Article" followed by a number
New Auto-Interp
Negative Logits
ITH
-0.86
adows
-0.85
aukee
-0.85
ascus
-0.73
eco
-0.72
inav
-0.72
awar
-0.72
oles
-0.71
escal
-0.70
aunder
-0.70
POSITIVE LOGITS
Continued
0.96
ICLE
0.92
Articles
0.90
Header
0.85
Article
0.84
ARTICLE
0.78
Consent
0.78
Says
0.74
Article
0.74
meal
0.72
Activations Density 0.007%