INDEX
Explanations
themes related to decision-making and existential concerns
New Auto-Interp
Negative Logits
ä½łä»¬
-0.18
ois
-0.16
تÙĪÙĨ
-0.16
vez
-0.14
yours
-0.14
Richards
-0.14
oin
-0.14
oba
-0.13
zung
-0.13
ahir
-0.13
POSITIVE LOGITS
ourselves
0.49
we
0.36
æĪij们
0.32
æĪijåĢij
0.32
ìļ°ë¦¬ëĬĶ
0.31
ï¼ĮæĪij们
0.28
our
0.28
kita
0.25
abych
0.25
we
0.23
Activations Density 0.404%