INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
skelet
-0.71
lihood
-0.66
uncomp
-0.65
affirmative
-0.64
unchecked
-0.63
thinkable
-0.60
citiz
-0.60
ebin
-0.60
unfavorable
-0.59
ī
-0.59
POSITIVE LOGITS
Lost
0.76
isite
0.75
Davis
0.72
osponsors
0.71
sbm
0.70
Lab
0.70
mat
0.69
irlf
0.66
Detect
0.66
ete
0.65
Activations Density 0.000%
No Known Activations
This feature has no known activations.