INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
renheit
-0.91
verty
-0.84
ivals
-0.83
rador
-0.81
verbs
-0.81
yip
-0.81
elong
-0.78
interrupted
-0.78
alogue
-0.77
thood
-0.76
POSITIVE LOGITS
Radeon
0.74
orc
0.68
tattoo
0.63
Patch
0.63
Mull
0.62
Morph
0.61
Rx
0.60
Planes
0.58
arsenal
0.58
Targ
0.58
Activations Density 0.000%
No Known Activations
This feature has no known activations.