INDEX
Explanations
topics related to societal issues and social justice
New Auto-Interp
Negative Logits
eniable
-0.15
endir
-0.14
šti
-0.13
vůbec
-0.13
nam
-0.13
ãģĨãģ¡
-0.13
elden
-0.12
ÎIJ
-0.12
adesh
-0.12
iaux
-0.12
POSITIVE LOGITS
like
1.45
Like
1.13
Like
1.02
like
0.99
LIKE
0.98
_like
0.88
như
0.83
.like
0.82
como
0.82
seperti
0.81
Activations Density 1.218%