INDEX
Explanations
possessive + action or quality
New Auto-Interp
Negative Logits
æĪĴ
-0.09
akk
-0.09
afil
-0.08
Smy
-0.08
goodwill
-0.08
blame
-0.08
inher
-0.08
Fir
-0.08
sincere
-0.08
svm
-0.08
POSITIVE LOGITS
efforts
0.24
contribution
0.22
role
0.20
contributions
0.19
sake
0.19
effort
0.17
actions
0.17
part
0.16
help
0.15
work
0.15
Activations Density 0.033%