INDEX
Explanations
phrases indicating causation or reasons
New Auto-Interp
Negative Logits
Cæsar
-0.75
ГЛА
-0.68
atoare
-0.64
quæ
-0.64
ajuns
-0.64
gerichtet
-0.61
aikaa
-0.61
ovací
-0.60
valdi
-0.60
yszer
-0.60
POSITIVE LOGITS
wegen
0.99
Because
0.98
karena
0.96
because
0.94
owing
0.94
بسبب
0.94
Because
0.94
vanwege
0.92
because
0.92
Aufgrund
0.90
Activations Density 0.168%