INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
bara
-0.71
loader
-0.71
threaded
-0.70
scrutiny
-0.69
Oracle
-0.67
reciproc
-0.66
acknow
-0.65
piv
-0.65
voc
-0.64
Wiki
-0.63
POSITIVE LOGITS
âĹ¼
0.88
engineering
0.80
Ballard
0.77
arc
0.75
Surv
0.72
Allison
0.71
occ
0.68
atell
0.68
utical
0.67
Ala
0.67
Activations Density 0.000%
No Known Activations
This feature has no known activations.