INDEX
Explanations
statements addressing the reader directly about their actions or decisions
New Auto-Interp
Negative Logits
æ´»
-0.18
Bec
-0.17
urdy
-0.15
DATE
-0.14
bec
-0.14
aines
-0.14
Bien
-0.14
becoming
-0.14
/goto
-0.13
prene
-0.13
POSITIVE LOGITS
encounter
0.22
experience
0.22
_experience
0.18
RIX
0.18
experiences
0.18
encounters
0.18
experience
0.18
experiencing
0.17
Experience
0.17
encountering
0.17
Activations Density 0.161%