INDEX
Explanations
No Explanations Found
New Auto-Interp
Negative Logits
greatly
0.38
guess
0.38
嗇
0.37
dunk
0.37
;%%
0.36
dips
0.36
scald
0.36
vastly
0.36
⏭
0.36
evol
0.36
POSITIVE LOGITS
Cock
1.11
Cock
0.96
cock
0.95
cock
0.93
kok
0.72
cockpit
0.70
Cocker
0.70
Kok
0.68
кок
0.68
cocker
0.64
Activations Density 0.005%