INDEX
Explanations
expressions of uncertainty or denial
New Auto-Interp
Negative Logits
were
-1.13
are
-0.85
were
-0.84
weren
-0.81
Were
-0.75
WERE
-0.73
don
-0.72
voltak
-0.71
aren
-0.66
are
-0.66
POSITIVE LOGITS
itſelf
0.97
Monfieur
0.84
Appears
0.77
recognises
0.76
appears
0.75
penetrates
0.74
Serves
0.73
ftagPool
0.73
Beſ
0.73
does
0.73
Activations Density 0.131%