INDEX
Explanations
sports teams and collectibles
New Auto-Interp
Negative Logits
TokenNameEQUAL
-0.86
humanos
-0.85
INSPIRE
-0.74
screenshots
-0.69
ンダル
-0.69
Согласно
-0.69
jectures
-0.68
埃及
-0.68
pata
-0.67
malfunction
-0.66
POSITIVE LOGITS
cards
1.01
card
0.96
Pose
0.85
backs
0.84
tobacco
0.81
poses
0.81
Pose
0.80
expressions
0.80
HORIZONTAL
0.80
borders
0.79
Activations Density 0.006%