INDEX
Explanations
themes related to personal identity and interpersonal relationships
New Auto-Interp
Negative Logits
ãģ¾ãģļ
-0.15
ãĥ¼ãĥĬ
-0.14
ppo
-0.14
Already
-0.14
loi
-0.14
ãĥ«ãĥķ
-0.14
átka
-0.14
ÅĽcie
-0.14
ëħĦëıĦ
-0.14
åħ¸
-0.13
POSITIVE LOGITS
sometimes
1.16
occasionally
0.99
sometimes
0.98
Sometimes
0.91
Sometimes
0.87
ometimes
0.75
occasional
0.75
Occasionally
0.75
иногда
0.74
often
0.63
Activations Density 1.165%