INDEX
Explanations
variations of the word "mut."
New Auto-Interp
Negative Logits
ed
-0.16
aug
-0.16
haled
-0.15
idon
-0.15
off
-0.15
croll
-0.14
inkle
-0.14
oze
-0.14
endir
-0.14
rest
-0.14
POSITIVE LOGITS
mut
0.25
Mut
0.24
agen
0.24
mut
0.23
ual
0.23
Mutual
0.23
iple
0.22
mutual
0.22
MUT
0.21
Mut
0.21
Activations Density 0.008%