INDEX
Explanations
expressions of personal reflections and thoughts
New Auto-Interp
Negative Logits
lag
-0.17
ĤŃ
-0.15
sr
-0.14
Mocks
-0.13
oro
-0.13
ilar
-0.13
action
-0.13
详æĥħ
-0.13
ister
-0.13
æĥ
-0.13
POSITIVE LOGITS
thoughts
0.51
observations
0.41
Thoughts
0.40
observations
0.38
mus
0.33
Observ
0.32
Thought
0.31
notes
0.30
remarks
0.29
thought
0.29
Activations Density 0.262%