INDEX
Explanations
concepts related to moral or spiritual conflict between desires
New Auto-Interp
Negative Logits
...
-0.19
č
-0.18
...
-0.17
 
-0.17
�
-0.16
"↵
-0.16
\\
-0.15
ÂŃ
-0.15
-0.15
âĢij
-0.15
POSITIVE LOGITS
athe
0.18
98
0.16
3
0.16
Bour
0.16
5
0.15
4
0.15
6
0.15
esthes
0.15
jde
0.14
Athe
0.14
Activations Density 0.006%