INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ertodd
-0.80
itching
-0.77
uty
-0.74
bryce
-0.72
@#&
-0.71
eno
-0.71
vale
-0.70
Laughs
-0.69
paces
-0.69
lehem
-0.68
POSITIVE LOGITS
descendant
0.71
bour
0.71
Aus
0.67
labelled
0.67
Indust
0.66
Bulls
0.64
BW
0.63
descendants
0.63
imitation
0.63
Bravo
0.61
Activations Density 0.000%
No Known Activations
This feature has no known activations.