INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
encies
-0.84
ency
-0.70
ilege
-0.70
minority
-0.68
Fake
-0.67
Corpus
-0.65
conserv
-0.64
ĵĺ
-0.64
pmwiki
-0.64
locality
-0.63
POSITIVE LOGITS
ificial
0.79
cery
0.76
ertodd
0.70
alth
0.70
odes
0.69
icro
0.69
pport
0.64
Capture
0.64
istries
0.64
ary
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.