INDEX
Explanations
instances of evaluation or description of character and behavior
New Auto-Interp
Negative Logits
íĨłíĨł
-0.17
biên
-0.16
#
-0.15
@nate
-0.15
ëį°ìĿ´íĬ¸
-0.15
IFn
-0.14
fkk
-0.14
пÑĢеж
-0.14
âĦĸâĦĸ
-0.14
asz
-0.14
POSITIVE LOGITS
someone
0.19
accomplished
0.19
successful
0.19
someone
0.18
somebody
0.18
Successful
0.17
Successful
0.17
successful
0.17
successfully
0.17
success
0.16
Activations Density 0.140%