INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
hostage
-0.72
pend
-0.69
jad
-0.64
prison
-0.63
away
-0.62
cock
-0.61
commit
-0.61
---------
-0.61
times
-0.60
Bang
-0.60
POSITIVE LOGITS
chery
0.78
eria
0.76
Beir
0.75
ajo
0.75
iculture
0.70
kefeller
0.70
Shinra
0.68
icter
0.68
ierre
0.67
allery
0.66
Activations Density 0.000%
No Known Activations
This feature has no known activations.