INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
extras
-0.66
Index
-0.63
bc
-0.61
backer
-0.61
Len
-0.59
itton
-0.58
downed
-0.58
ength
-0.57
owicz
-0.57
zen
-0.57
POSITIVE LOGITS
oming
0.80
mop
0.72
anasia
0.71
fed
0.70
rolet
0.70
mercial
0.65
itual
0.65
transplant
0.64
ktop
0.64
vernment
0.63
Activations Density 0.000%
No Known Activations
This feature has no known activations.