INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
CBI
-0.80
sarc
-0.63
---------
-0.63
broth
-0.62
ridor
-0.62
ENDED
-0.61
inflamm
-0.61
ANN
-0.60
TRAN
-0.60
Prompt
-0.59
POSITIVE LOGITS
loo
0.85
estead
0.73
ographers
0.72
ouched
0.72
adies
0.71
perty
0.71
iquette
0.71
writers
0.70
kefeller
0.70
ieties
0.69
Activations Density 0.000%
No Known Activations
This feature has no known activations.