INDEX
Explanations
phrases related to questioning or making statements about various subjects
questions related to accountability and critique
New Auto-Interp
Negative Logits
yssey
-0.85
©¶æ¥µ
-0.82
inery
-0.74
yk
-0.72
CV
-0.71
imil
-0.69
enez
-0.67
Ire
-0.67
accompanied
-0.67
20439
-0.67
POSITIVE LOGITS
?:
1.39
?'
1.32
?
1.28
?"
1.27
?",
1.27
?".
1.27
?'"
1.27
?)
1.25
?).
1.22
...?
1.20
Activations Density 0.321%