INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
obal
-0.78
µ
-0.74
destro
-0.69
ciating
-0.68
uana
-0.68
ivot
-0.68
zb
-0.66
monds
-0.64
±
-0.63
ython
-0.62
POSITIVE LOGITS
gyn
0.64
Fritz
0.62
Strauss
0.61
fred
0.60
achus
0.59
heimer
0.59
vier
0.58
born
0.58
Babe
0.58
Conservative
0.57
Activations Density 0.000%
No Known Activations
This feature has no known activations.