INDEX
Explanations
terms related to inner experiences and introspection
New Auto-Interp
Negative Logits
sse
-0.17
sdale
-0.16
sin
-0.16
entifier
-0.16
adam
-0.16
åħ¥ãĤĬ
-0.16
iec
-0.16
ละ
-0.15
amel
-0.15
/***/
-0.15
POSITIVE LOGITS
most
0.52
halb
0.37
MOST
0.29
-most
0.29
most
0.28
workings
0.27
/ext
0.25
wear
0.25
-city
0.23
Most
0.23
Activations Density 0.020%