INDEX
Explanations
references to physical appearances and comparisons
New Auto-Interp
Negative Logits
lal
-0.15
evin
-0.13
tron
-0.13
oppel
-0.13
aud
-0.13
ipers
-0.13
lion
-0.13
ailand
-0.13
Jacobs
-0.13
subcategory
-0.13
POSITIVE LOGITS
like
0.53
Like
0.42
likes
0.41
LIKE
0.39
like
0.38
Like
0.38
.like
0.35
seperti
0.34
_like
0.32
como
0.32
Activations Density 0.086%