INDEX
Explanations
language that expresses contrast and conditionality
New Auto-Interp
Negative Logits
raw
-0.43
original
-0.42
-0.41
kle
-0.39
es
-0.38
oka
-0.38
voran
-0.38
stat
-0.38
sharing
-0.37
پار
-0.37
POSITIVE LOGITS
when
1.60
during
1.42
when
1.35
cuando
1.35
during
1.34
DURING
1.29
quando
1.23
later
1.22
WHEN
1.21
lorsque
1.20
Activations Density 0.931%