INDEX
Explanations
statements about the consequences or results of actions
New Auto-Interp
Negative Logits
quindi
-0.15
THEN
-0.14
kaldı
-0.14
billeder
-0.14
usk
-0.13
hence
-0.13
iyim
-0.13
então
-0.13
окÑĢема
-0.13
ÙĦذا
-0.13
POSITIVE LOGITS
although
0.35
while
0.35
when
0.35
since
0.33
during
0.31
unlike
0.29
despite
0.29
when
0.28
if
0.27
whereas
0.27
Activations Density 0.935%