INDEX
Explanations
words indicating causation or reason
the phrase "So" used to make transitions or conclusions in statements
New Auto-Interp
Negative Logits
saf
-0.71
thro
-0.67
``(
-0.63
¢
-0.62
exclusive
-0.62
Â
-0.59
'';
-0.56
ASED
-0.56
sk
-0.55
Scand
-0.55
POSITIVE LOGITS
oner
1.26
bered
1.00
fter
0.98
apy
0.95
othes
0.94
FTWARE
0.94
oths
0.85
ooo
0.84
othe
0.84
aps
0.84
Activations Density 0.055%