INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Patty
-0.72
amina
-0.68
adish
-0.68
Lowry
-0.66
Fuller
-0.64
ia
-0.63
law
-0.62
Barron
-0.62
ĥ
-0.62
idity
-0.61
POSITIVE LOGITS
showc
0.77
forward
0.76
elig
0.74
quartered
0.74
compan
0.72
minist
0.70
eworld
0.70
powd
0.67
perf
0.66
feder
0.66
Activations Density 0.000%
No Known Activations
This feature has no known activations.