INDEX
Explanations
instances of various verbs and actions
New Auto-Interp
Negative Logits
inger
-0.17
acie
-0.15
iar
-0.15
å©
-0.13
_CO
-0.13
ierte
-0.13
/native
-0.13
/on
-0.13
Gent
-0.13
oux
-0.13
POSITIVE LOGITS
OAD
0.15
trú
0.14
.man
0.14
magna
0.14
izard
0.14
umen
0.14
ilton
0.14
PEC
0.13
UGIN
0.13
Wholesale
0.13
Activations Density 0.084%