INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
ANCE
-0.88
JUST
-0.87
IFE
-0.73
issance
-0.73
ufact
-0.72
Kn
-0.71
YN
-0.71
minster
-0.69
Import
-0.69
Writ
-0.69
POSITIVE LOGITS
atic
0.71
agraph
0.69
rats
0.68
entric
0.68
sympathetic
0.67
etting
0.67
agonist
0.67
gap
0.66
atan
0.65
agically
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.