INDEX
Explanations
concepts related to freedom or lack of constraints
New Auto-Interp
Negative Logits
ãĥ³ãĤ°
-0.17
uality
-0.16
lou
-0.16
riad
-0.16
py
-0.16
la
-0.15
lah
-0.15
aul
-0.15
st
-0.15
nya
-0.15
POSITIVE LOGITS
bies
0.34
bie
0.32
-floating
0.28
lance
0.26
zers
0.26
bsd
0.25
-wheel
0.25
-flow
0.24
zed
0.24
-standing
0.24
Activations Density 0.065%