INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
OND
-0.78
IST
-0.68
Jord
-0.64
UES
-0.62
laun
-0.61
IPP
-0.61
Cmd
-0.61
roots
-0.60
Airl
-0.60
Cherokee
-0.59
POSITIVE LOGITS
ername
0.71
teasp
0.69
ventus
0.69
ellectual
0.67
ptin
0.65
cliffe
0.65
ibling
0.64
mitter
0.64
ividual
0.64
rotein
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.