INDEX
Explanations
edits made to a text
instances of the word "edit" or related terms indicating editing actions
New Auto-Interp
Negative Logits
Bucc
-0.65
avement
-0.60
cling
-0.59
squared
-0.59
gging
-0.59
Jindal
-0.58
fund
-0.58
Fernand
-0.58
gart
-0.58
footed
-0.58
POSITIVE LOGITS
edit
0.94
ional
0.92
ors
0.89
orship
0.88
Edit
0.85
IONS
0.82
OR
0.81
EDIT
0.81
rator
0.78
arro
0.76
Activations Density 0.011%