INDEX
Explanations
references to sweat or perspiration
New Auto-Interp
Negative Logits
олеÑĤ
-0.19
ña
-0.17
mente
-0.16
aic
-0.16
yang
-0.15
ales
-0.15
HOOK
-0.15
edores
-0.14
ailles
-0.14
ozilla
-0.14
POSITIVE LOGITS
shirt
0.33
sweat
0.32
Swe
0.28
Sweat
0.27
swe
0.27
shops
0.24
pants
0.23
shop
0.23
equity
0.22
æ±Ĺ
0.21
Activations Density 0.009%