INDEX
Explanations
updated information in news articles
instances of the word "Updated"
New Auto-Interp
Negative Logits
stood
-0.91
ften
-0.77
marks
-0.77
bour
-0.76
acht
-0.75
OPA
-0.74
belt
-0.73
marked
-0.73
holes
-0.73
thing
-0.72
POSITIVE LOGITS
Updates
0.83
Updated
0.80
Update
0.75
Attempts
0.75
timestamp
0.74
Posted
0.74
Comments
0.73
Published
0.72
Dates
0.72
EDIT
0.72
Activations Density 0.008%