INDEX
Explanations
actions related to transition or change
New Auto-Interp
Negative Logits
ING
-0.20
ing
-0.18
ERS
-0.16
ify
-0.16
ingga
-0.16
olini
-0.15
ors
-0.15
ers
-0.15
tro
-0.15
asco
-0.15
POSITIVE LOGITS
of
0.21
redient
0.17
-of
0.16
agent
0.15
dol
0.15
moment
0.14
agent
0.14
point
0.14
ë¡ł
0.14
effect
0.14
Activations Density 0.082%