INDEX
Explanations
references to physical and mental well-being
New Auto-Interp
Negative Logits
_physical
-0.17
rog
-0.16
çī©çIJĨ
-0.16
Physics
-0.16
erez
-0.16
Physics
-0.15
кав
-0.15
aca
-0.15
/goto
-0.15
okit
-0.15
POSITIVE LOGITS
ity
0.41
ITY
0.29
ities
0.26
s
0.23
/log
0.23
therapist
0.23
ized
0.22
therapists
0.22
/em
0.21
mente
0.21
Activations Density 0.023%