INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
abouts
-0.88
tim
-0.86
aun
-0.85
stru
-0.67
trace
-0.66
piv
-0.64
-0.64
fund
-0.64
transitioned
-0.63
estone
-0.63
POSITIVE LOGITS
ILCS
0.77
amins
0.71
censor
0.70
cill
0.67
righteousness
0.66
actionGroup
0.65
censorship
0.65
thora
0.65
MENT
0.63
piracy
0.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.