INDEX
Explanations
phrases related to cause and effect
phrases that indicate causation or attribution
New Auto-Interp
Negative Logits
yahoo
-0.64
holder
-0.64
ura
-0.63
ahs
-0.63
ath
-0.62
anon
-0.60
oss
-0.60
zon
-0.60
mins
-0.59
aru
-0.59
POSITIVE LOGITS
partly
1.07
solely
0.89
chiefly
0.88
partially
0.79
disproportionately
0.79
factors
0.78
principally
0.78
exacerbated
0.76
largely
0.76
entirely
0.76
Activations Density 0.208%