INDEX
Explanations
verbs and their various forms
New Auto-Interp
Negative Logits
iott
-0.80
ICAN
-0.77
accompanied
-0.76
ITIES
-0.69
mouth
-0.69
icult
-0.64
indef
-0.62
sterdam
-0.60
CHAT
-0.60
ipolar
-0.60
POSITIVE LOGITS
dule
1.09
lde
0.90
xual
0.89
Ń·
0.84
ption
0.78
lder
0.76
roo
0.76
lled
0.75
pherd
0.72
hett
0.72
Activations Density 0.008%