INDEX
Explanations
actions related to maintaining balance and moderation in various activities
New Auto-Interp
Negative Logits
èĥ½
-0.18
erville
-0.16
correctly
-0.16
еÑı
-0.15
èĥ½å¤Ł
-0.15
aille
-0.15
frequ
-0.14
rys
-0.14
çĦ¶
-0.14
urma
-0.13
POSITIVE LOGITS
anytime
0.27
ัà¸Ļà¹Ħà¸Ķ
0.20
à¹Ħà¸Ķ
0.20
easily
0.20
anywhere
0.19
à¹Ħà¸Ķ
0.18
feas
0.17
opies
0.16
Saf
0.16
à¹Įà¹Ħà¸Ķ
0.16
Activations Density 0.890%