INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
gob
-0.73
bol
-0.72
apo
-0.71
undo
-0.71
weet
-0.71
simul
-0.70
SB
-0.69
irtual
-0.69
tu
-0.65
pa
-0.64
POSITIVE LOGITS
Dialogue
1.10
pmwiki
0.86
Dead
0.78
unlaw
0.73
Stretch
0.72
Rh
0.72
ALSE
0.72
Sham
0.71
OTOS
0.70
UFF
0.68
Activations Density 0.000%
No Known Activations
This feature has no known activations.