INDEX
Explanations
strong verbs and action-related words in various contexts
New Auto-Interp
Negative Logits
ahn
-0.17
atch
-0.14
egal
-0.14
aran
-0.14
emoc
-0.14
Lon
-0.13
postpone
-0.13
urai
-0.13
anders
-0.13
dagen
-0.13
POSITIVE LOGITS
etc
0.27
ÑĤоÑīо
0.21
etc
0.18
illac
0.15
dit
0.15
áº
0.15
ffects
0.14
eson
0.14
agnosis
0.14
以åıĬ
0.14
Activations Density 0.193%