INDEX
Explanations
negations and expressions of uncertainty or denial
New Auto-Interp
Negative Logits
います
-0.77
itſelf
-0.76
rungsseite
-0.76
habet
-0.75
Majefty
-0.70
Monfieur
-0.69
retires
-0.69
hangs
-0.68
trône
-0.68
eorum
-0.68
POSITIVE LOGITS
was
1.12
weren
1.05
wasn
1.04
didn
1.02
Was
0.97
Wasn
0.95
were
0.93
didn
0.92
had
0.91
Was
0.90
Activations Density 0.081%