INDEX
Explanations
verbs related to actions or behaviors
New Auto-Interp
Negative Logits
mare
-0.74
lights
-0.67
mares
-0.64
antha
-0.59
Lank
-0.58
Methods
-0.55
Statistics
-0.54
)=(
-0.53
Canal
-0.53
Azerb
-0.53
POSITIVE LOGITS
omsday
0.97
ppel
0.88
le
0.84
pez
0.82
xx
0.81
justice
0.80
lez
0.79
laundry
0.78
omething
0.77
herty
0.76
Activations Density 0.095%