INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
oft
-0.73
Beckham
-0.70
lain
-0.68
Jelly
-0.67
ters
-0.67
Champ
-0.65
ijn
-0.64
oqu
-0.63
Om
-0.63
Guardiola
-0.63
POSITIVE LOGITS
WARE
0.87
hol
0.83
neighbors
0.77
neighbor
0.76
IMAGES
0.76
Azerb
0.76
arrang
0.75
mathemat
0.75
opio
0.74
everal
0.73
Activations Density 0.000%
No Known Activations
This feature has no known activations.