INDEX
Explanations
references to mouth-related imagery or descriptions
New Auto-Interp
Negative Logits
º«
-0.15
ãĥ£
-0.15
ATRIX
-0.14
ÎŃÏģ
-0.14
calar
-0.14
utow
-0.14
infinit
-0.14
hea
-0.14
embro
-0.14
ipar
-0.14
POSITIVE LOGITS
ful
0.34
piece
0.32
wash
0.29
water
0.28
-water
0.26
FUL
0.26
watering
0.25
pieces
0.25
feel
0.23
guards
0.23
Activations Density 0.019%