INDEX
Explanations
conversational phrases asking about well-being
New Auto-Interp
Negative Logits
nos
-0.18
aben
-0.16
ify
-0.15
Relief
-0.15
nos
-0.15
VERR
-0.14
Veter
-0.14
nete
-0.14
_blk
-0.13
ARISING
-0.13
POSITIVE LOGITS
healthy
0.24
healthy
0.20
ä¸ĢåĪĩ
0.20
enjoying
0.19
khá»ıe
0.19
health
0.18
urdy
0.18
HEALTH
0.18
Healthy
0.18
åģ¥åº·
0.18
Activations Density 0.195%