INDEX
Explanations
words related to actions or commands directed towards a specific person or group
verbs indicating action or commands
New Auto-Interp
Negative Logits
mith
-0.64
SPONSORED
-0.63
printf
-0.58
constitu
-0.57
Mehran
-0.56
dfx
-0.56
]=
-0.55
disg
-0.54
spons
-0.53
cv
-0.53
POSITIVE LOGITS
Yourself
1.39
yourself
1.24
ments
1.16
Your
1.15
your
1.14
yourselves
1.14
ings
1.07
ment
0.97
able
0.96
ables
0.95
Activations Density 0.301%