INDEX
Explanations
phrases indicating causality or consequence
statements indicating causality or conclusions
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.65
MH
-0.61
TAG
-0.61
bage
-0.58
Guard
-0.57
ALS
-0.55
HT
-0.54
feet
-0.54
sands
-0.54
Him
-0.54
POSITIVE LOGITS
soever
0.72
they
0.70
eday
0.65
ocom
0.65
agnar
0.65
whoever
0.64
/+
0.63
we
0.63
prest
0.62
umers
0.61
Activations Density 0.207%