INDEX
Explanations
references to cats or related cat terminology
New Auto-Interp
Negative Logits
")));
-0.80
Aguilera
-0.70
PYX
-0.69
оригіналу
-0.68
"'");
-0.68
]))
-0.68
")));
-0.67
compri
-0.67
)");
-0.66
alberto
-0.66
POSITIVE LOGITS
cat
3.46
Cat
3.35
Cat
3.18
cat
3.04
cats
2.94
CAT
2.81
Cats
2.71
CAT
2.60
Cats
2.60
cats
2.53
Activations Density 0.063%