INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
imus
-0.74
hetics
-0.71
oway
-0.70
Baptist
-0.70
pheus
-0.69
ocus
-0.68
uther
-0.68
yrus
-0.67
aptic
-0.66
itars
-0.66
POSITIVE LOGITS
theless
0.79
netflix
0.77
env
0.62
antidote
0.61
Hig
0.61
battlefield
0.61
harb
0.58
acron
0.58
ned
0.58
complied
0.57
Activations Density 0.000%
No Known Activations
This feature has no known activations.