INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Choice
-0.71
Chair
-0.68
ogens
-0.67
Vest
-0.64
univers
-0.64
Nusra
-0.63
Printing
-0.62
Checking
-0.61
Arg
-0.61
onom
-0.61
POSITIVE LOGITS
antz
0.82
incompet
0.78
renheit
0.66
tg
0.65
aries
0.65
thirsty
0.64
ateur
0.62
irie
0.61
badly
0.60
ushima
0.60
Activations Density 0.000%
No Known Activations
This feature has no known activations.