INDEX
Explanations
references to authoritative statements or claims
New Auto-Interp
Negative Logits
@[
-0.15
venes
-0.14
_mime
-0.14
imson
-0.14
_gp
-0.14
enschaft
-0.14
apgolly
-0.14
amilia
-0.14
Higgins
-0.14
strup
-0.14
POSITIVE LOGITS
uco
0.15
spd
0.15
is
0.15
elsen
0.15
io
0.15
idle
0.15
idy
0.14
lick
0.14
discharged
0.14
dist
0.14
Activations Density 0.004%