INDEX
Explanations
phrases related to self-talk and internal dialogue
concepts related to self-reflection and personal actions
New Auto-Interp
Negative Logits
Reconstruction
-0.64
ighed
-0.57
purported
-0.56
gerald
-0.55
supplemented
-0.55
Annex
-0.54
instituted
-0.51
Ĭ±
-0.50
Il
-0.50
1967
-0.49
POSITIVE LOGITS
yourself
1.50
yourselves
1.27
Yourself
1.09
your
0.96
YOUR
0.85
your
0.78
Your
0.76
Your
0.72
poke
0.64
yours
0.63
Activations Density 0.743%