INDEX
Explanations
words and phrases that indicate judgment or decision-making contexts
New Auto-Interp
Negative Logits
uity
-0.15
Äįe
-0.15
acket
-0.14
inesis
-0.14
usu
-0.14
owied
-0.13
ãģĮãģĦ
-0.13
aÄį
-0.12
ниÑĨ
-0.12
achts
-0.12
POSITIVE LOGITS
ej
0.14
ÂĿ
0.14
—
0.14
NewItem
0.13
jer
0.13
odd
0.13
å¢
0.13
(*)(
0.12
TMPro
0.12
ena
0.12
Activations Density 0.015%