INDEX
Explanations
phrases indicating positive outcomes or success
because, due to, thanks to
New Auto-Interp
Negative Logits
poň
-0.40
Alfaro
-0.38
sorpresa
-0.38
tuur
-0.36
zaten
-0.36
mł
-0.35
processable
-0.35
izowane
-0.35
Penga
-0.34
consigo
-0.34
POSITIVE LOGITS
due
0.98
because
0.91
due
0.88
BECAUSE
0.87
thanks
0.85
DUE
0.84
because
0.84
благодаря
0.83
Because
0.83
Because
0.83
Activations Density 0.025%