INDEX
Explanations
sentences indicating urgency or impending consequences
New Auto-Interp
Negative Logits
($__
-0.55
üğ
-0.53
antaranya
-0.52
entanto
-0.51
unnitel
-0.50
GroupLayout
-0.50
tegens
-0.49
ostante
-0.49
désolés
-0.48
böz
-0.47
POSITIVE LOGITS
automatically
0.90
automaticamente
0.82
effectively
0.78
автоматически
0.78
Personendaten
0.77
automáticamente
0.75
automatisch
0.73
risk
0.71
ówczas
0.70
implicitly
0.69
Activations Density 0.463%