INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
protect
-0.77
ategory
-0.74
onym
-0.73
heter
-0.72
crime
-0.71
agall
-0.71
anya
-0.70
olon
-0.70
ignty
-0.70
piracy
-0.69
POSITIVE LOGITS
Klopp
0.72
Gallagher
0.71
Dull
0.70
Eliot
0.69
Stro
0.69
Wise
0.68
excerpts
0.67
Kenobi
0.67
Blumenthal
0.64
Dill
0.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.