INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
adesh
-0.81
elaide
-0.73
ieth
-0.72
itals
-0.71
aughs
-0.71
ocious
-0.67
pless
-0.63
obbies
-0.63
acho
-0.62
Ship
-0.62
POSITIVE LOGITS
Strauss
0.71
intermedi
0.69
INK
0.68
Manor
0.66
Mush
0.65
Cheney
0.65
TAG
0.64
riks
0.63
jon
0.63
Bernstein
0.62
Activations Density 0.000%
No Known Activations
This feature has no known activations.