INDEX
Explanations
conjunctions and transitions that connect ideas or statements
New Auto-Interp
Negative Logits
arden
-0.15
ucs
-0.15
chts
-0.15
ucker
-0.15
ros
-0.14
ulla
-0.14
.Pow
-0.14
locker
-0.13
ula
-0.13
aca
-0.13
POSITIVE LOGITS
Cay
0.16
forth
0.15
igi
0.14
ysz
0.14
957
0.14
cum
0.14
zp
0.14
domic
0.14
mand
0.13
oty
0.13
Activations Density 0.074%