INDEX
Explanations
references to asking questions or making requests
New Auto-Interp
Negative Logits
Monfieur
-0.78
Sodom
-0.71
SourceChecksum
-0.70
poffible
-0.69
noastre
-0.69
autorytatywna
-0.68
Shakspeare
-0.67
Efq
-0.65
féminine
-0.64
raiſ
-0.63
POSITIVE LOGITS
yourself
1.10
you
1.09
your
0.97
You
0.95
你不
0.86
你
0.83
yourself
0.81
你还
0.80
Yourself
0.79
넌
0.78
Activations Density 0.141%