INDEX
Explanations
words related to causes or reasons
phrases indicating causation or attribution
New Auto-Interp
Negative Logits
anka
-0.70
umar
-0.59
Chains
-0.57
tatt
-0.57
uminati
-0.55
REPORT
-0.50
uve
-0.50
Wanted
-0.49
Finch
-0.49
Alright
-0.49
POSITIVE LOGITS
partly
1.15
largely
1.10
chiefly
1.06
mainly
1.06
primarily
1.05
principally
1.04
mostly
0.95
entirely
0.90
partially
0.88
solely
0.84
Activations Density 0.139%