INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
³³³
-0.89
Reviewer
-0.86
³³³³
-0.76
ARCH
-0.74
alf
-0.73
Brend
-0.72
DOM
-0.71
Farm
-0.71
Chel
-0.70
tom
-0.70
POSITIVE LOGITS
ologne
0.79
weeney
0.75
TNT
0.69
comprom
0.68
stanbul
0.67
latex
0.67
undai
0.66
candles
0.66
orno
0.65
irmed
0.64
Activations Density 0.000%
No Known Activations
This feature has no known activations.