INDEX
Explanations
words denoting causation or attribution
phrases indicating causation or reasons
New Auto-Interp
Negative Logits
anka
-0.60
Chains
-0.58
clipboard
-0.57
stars
-0.57
hold
-0.56
STATES
-0.53
snipp
-0.52
Carbuncle
-0.52
ahs
-0.52
orum
-0.52
POSITIVE LOGITS
partly
1.33
chiefly
1.22
principally
1.16
largely
1.16
mainly
1.13
primarily
1.11
partially
1.03
mostly
0.98
solely
0.96
entirely
0.92
Activations Density 0.144%