INDEX
Explanations
references to personal pronouns and their emotional implications
New Auto-Interp
Negative Logits
/latest
-0.15
еÑİ
-0.14
etur
-0.14
uja
-0.14
ago
-0.14
iffe
-0.14
phia
-0.14
ayo
-0.13
of
-0.13
logue
-0.13
POSITIVE LOGITS
/us
0.19
/her
0.17
¶Į
0.15
self
0.14
.synthetic
0.14
ityEngine
0.14
ERGY
0.13
yna
0.13
-Cs
0.13
ãĥ¼ãĥľ
0.13
Activations Density 0.167%