INDEX
Explanations
phrases related to transformation or changing objects into another form
New Auto-Interp
Negative Logits
erity
-0.75
inately
-0.72
ran
-0.68
yright
-0.68
ritz
-0.67
forbids
-0.66
cies
-0.65
raint
-0.64
no
-0.64
enance
-0.64
POSITIVE LOGITS
usable
0.84
something
0.72
ãĥ¼ãĥ
0.69
a
0.68
fodder
0.68
ashes
0.67
surrogate
0.67
an
0.65
profitable
0.62
quished
0.61
Activations Density 0.052%