INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Cu
-0.69
sav
-0.67
rgb
-0.67
hod
-0.63
resh
-0.63
scram
-0.61
Zot
-0.61
factor
-0.61
untu
-0.60
rums
-0.60
POSITIVE LOGITS
Interstitial
0.68
Rebellion
0.65
bidden
0.64
dissatisf
0.64
ukemia
0.62
folly
0.62
acca
0.59
lein
0.59
onis
0.58
bad
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.