INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
predec
-0.80
incent
-0.80
constitu
-0.76
conting
-0.74
commitments
-0.72
ancest
-0.69
prosec
-0.69
undai
-0.66
thinkable
-0.66
ŃĶ
-0.66
POSITIVE LOGITS
ORY
0.82
Hale
0.73
Tart
0.72
Kon
0.70
ually
0.70
renheit
0.69
Raz
0.69
Griffin
0.69
Valencia
0.68
river
0.67
Activations Density 0.000%
No Known Activations
This feature has no known activations.