INDEX
Explanations
updates or revisions within a text
instances of the word "UPDATE."
New Auto-Interp
Negative Logits
atom
-0.83
azes
-0.81
ung
-0.78
odor
-0.77
ü
-0.77
fle
-0.77
rift
-0.76
course
-0.75
uana
-0.74
stood
-0.74
POSITIVE LOGITS
UPDATE
1.25
UPDATE
1.17
ION
1.09
CLAIM
1.06
BOX
1.02
EDIT
1.02
INGTON
0.99
ABOUT
0.97
BOOK
0.97
REPORT
0.97
Activations Density 0.011%