INDEX
Explanations
expressions of effort and commitment to achieving goals
New Auto-Interp
Negative Logits
adier
-0.18
sond
-0.18
erce
-0.15
ment
-0.15
moid
-0.15
лова
-0.15
Trident
-0.14
çº
-0.14
Grade
-0.14
auen
-0.14
POSITIVE LOGITS
oÅĪ
0.15
Prov
0.14
Prov
0.14
opaque
0.14
rest
0.14
bon
0.14
icons
0.14
iti
0.14
/all
0.13
zd
0.13
Activations Density 0.052%