INDEX
Explanations
phrases related to reflection and decision-making processes
New Auto-Interp
Negative Logits
whereas
-0.18
although
-0.17
tuy
-0.15
but
-0.15
dana
-0.14
nor
-0.14
vince
-0.14
ãģĹãģ¦ãģĬãĤĬ
-0.14
itest
-0.14
ostel
-0.14
POSITIVE LOGITS
ÙĪØª
0.20
à¹ģล
0.19
ãģĹãģ¦
0.19
ãĤĵãģ§
0.19
ï¼ĮæĬĬ
0.18
ãģĪãģ¦
0.18
ãģĦãģ¦
0.18
çĦ¶åIJİ
0.17
ãģ£ãģ¦
0.17
å¹¶
0.17
Activations Density 0.413%