INDEX
Explanations
verbs indicating some form of removal, reversal, or suspension
terms related to the cessation or reversal of actions or processes
New Auto-Interp
Negative Logits
rics
-0.70
resents
-0.62
acion
-0.61
idth
-0.61
Fit
-0.60
ocl
-0.59
dom
-0.59
sit
-0.58
ipeg
-0.58
Pr
-0.58
POSITIVE LOGITS
by
0.84
aback
0.73
entirely
0.69
BY
0.67
livious
0.66
ĸļ
0.65
oing
0.64
proport
0.64
accordingly
0.63
unanimously
0.62
Activations Density 0.262%