INDEX
Explanations
phrases that convey contrast, causation, or consideration
New Auto-Interp
Negative Logits
into
-0.49
pu
-0.45
заслу
-0.44
became
-0.44
Stop
-0.43
Into
-0.42
INTO
-0.42
hold
-0.41
had
-0.41
ADE
-0.40
POSITIVE LOGITS
tagHelperRunner
0.81
eftersom
0.81
perquè
0.78
ponieważ
0.77
kerana
0.77
puisque
0.76
poichè
0.74
aunque
0.73
поскольку
0.73
deoarece
0.72
Activations Density 0.746%