INDEX
Explanations
questions regarding someone's profession, attire, origin, or desires.
New Auto-Interp
Negative Logits
are
-1.13
were
-1.07
have
-0.91
aren
-0.85
weren
-0.84
ARE
-0.84
have
-0.81
themselves
-0.81
WERE
-0.77
mają
-0.76
POSITIVE LOGITS
does
0.94
Does
0.86
Does
0.85
does
0.85
DOES
0.73
DOES
0.69
doth
0.69
doesn
0.67
doesn
0.64
Has
0.63
Activations Density 0.953%