INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
Ys
1.04
POP
1.02
RSC
0.99
Pop
0.97
oy
0.96
Pop
0.96
roy
0.96
Obj
0.95
Poc
0.95
Dy
0.95
POSITIVE LOGITS
mannit
0.83
----------------
0.82
Herman
0.79
mannitol
0.79
Kran
0.73
Cann
0.72
katan
0.72
Vetter
0.72
Rift
0.70
Chandler
0.70
Activations Density 2.043%