INDEX
Explanations
words associated with health, wellness, and social interactions
New Auto-Interp
Negative Logits
105
-0.17
ов
-0.15
andra
-0.14
lu
-0.14
iegel
-0.14
Stopping
-0.14
halt
-0.14
ozo
-0.14
107
-0.14
inkel
-0.14
POSITIVE LOGITS
actionTypes
0.15
uÄį
0.15
iali
0.15
_PACK
0.15
\base
0.15
ToAdd
0.14
bero
0.14
urum
0.14
út
0.14
------+------+
0.14
Activations Density 0.037%