INDEX
Explanations
actions related to physical movement or interaction
New Auto-Interp
Negative Logits
ortium
-0.68
][
-0.67
xus
-0.61
inct
-0.61
ocaust
-0.60
yne
-0.59
enny
-0.59
esa
-0.58
ague
-0.58
reshold
-0.58
POSITIVE LOGITS
accordingly
0.95
himself
0.73
thereafter
0.71
therein
0.68
alike
0.64
promptly
0.61
several
0.59
consequently
0.59
herself
0.59
them
0.58
Activations Density 0.232%