INDEX
Explanations
phrases that describe appearances or visual characteristics
New Auto-Interp
Negative Logits
roker
-0.15
874
-0.15
utra
-0.14
ADX
-0.14
Ú¯ÛĮرÛĮ
-0.14
iet
-0.14
иÑģÑĤÑĢа
-0.14
looking
-0.14
æ´¥
-0.14
oeff
-0.13
POSITIVE LOGITS
like
0.43
Like
0.36
like
0.35
Like
0.34
LIKE
0.31
LIKE
0.30
-like
0.28
_like
0.27
likes
0.25
.like
0.24
Activations Density 0.007%