INDEX
Explanations
phrases that express appreciation or gratitude towards others
New Auto-Interp
Negative Logits
hani
-0.15
heet
-0.14
AGO
-0.14
Ñģебе
-0.14
killer
-0.14
.ribbon
-0.14
ãģĵãĤį
-0.14
hest
-0.13
.gc
-0.13
ắn
-0.13
POSITIVE LOGITS
being
0.33
having
0.29
being
0.23
Being
0.23
Being
0.22
Having
0.21
Having
0.19
daring
0.19
having
0.19
not
0.18
Activations Density 0.095%