INDEX
Explanations
expressions of opinion or feedback
New Auto-Interp
Negative Logits
tera
-0.18
icie
-0.15
esan
-0.15
tero
-0.15
ongan
-0.14
vero
-0.14
abant
-0.14
Incontri
-0.14
Pok
-0.14
Ale
-0.14
POSITIVE LOGITS
like
0.47
like
0.33
Like
0.32
Like
0.31
_like
0.31
likes
0.30
LIKE
0.29
.like
0.27
como
0.26
như
0.26
Activations Density 0.037%