INDEX
Explanations
news article metadata such as publication dates and titles
instances of the word "First"
New Auto-Interp
Negative Logits
Wer
-0.64
mble
-0.62
steen
-0.62
Canaver
-0.62
termination
-0.61
edom
-0.61
Genie
-0.59
avoidance
-0.58
geries
-0.58
nery
-0.58
POSITIVE LOGITS
Published
0.90
Posts
0.77
Posted
0.75
posted
0.74
archived
0.74
Upload
0.68
Posted
0.66
post
0.64
published
0.64
Download
0.63
Activations Density 0.027%