INDEX
Explanations
verbs indicating achievement or success
New Auto-Interp
Negative Logits
Yourself
-0.19
.FontStyle
-0.18
/REC
-0.18
yourselves
-0.17
yourself
-0.17
unp
-0.15
ourselves
-0.15
svůj
-0.14
oneself
-0.14
pid
-0.14
POSITIVE LOGITS
us
0.22
ä¸įäºĨ
0.16
him
0.15
them
0.15
isas
0.14
McGr
0.14
bä
0.14
è£ķ
0.14
angles
0.14
alog
0.14
Activations Density 0.311%