INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
rophe
-0.73
rive
-0.69
onial
-0.67
hesive
-0.67
perty
-0.66
roadside
-0.65
oro
-0.65
vernment
-0.65
Loc
-0.64
actic
-0.64
POSITIVE LOGITS
iors
0.74
colours
0.74
wcsstore
0.73
colors
0.66
Bench
0.60
stripes
0.59
Saban
0.59
hawks
0.58
gpu
0.58
Glacier
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.