INDEX
Explanations
statements of opinions or reactions, often characterized by specific phrasing
New Auto-Interp
Negative Logits
'-';↵
-0.14
'/';↵
-0.14
},{↵-0.13
"/"↵
-0.13
VOID
-0.12
ozor
-0.12
ado
-0.12
à¸Īาà¸ģà¸ģาร
-0.12
ith
-0.12
éĽħé»ij
-0.12
POSITIVE LOGITS
"
0.46
“
0.38
'
0.34
ãĢĮ
0.27
``
0.26
‘
0.26
`
0.25
\"
0.25
«
0.24
ãĢĮæĪij
0.24
Activations Density 0.535%