INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
perse
-0.87
rika
-0.76
rition
-0.75
otle
-0.74
Constructed
-0.71
chell
-0.70
Paste
-0.69
iaries
-0.69
atus
-0.69
acity
-0.69
POSITIVE LOGITS
embargo
0.79
dominated
0.78
starved
0.76
deceived
0.76
deprived
0.76
ļéĨĴ
0.74
desp
0.74
subord
0.72
perce
0.71
offended
0.71
Activations Density 0.000%
No Known Activations
This feature has no known activations.