INDEX
Explanations
phrases indicating alignment or consistency with specific principles or standards
phrases indicating alignment or agreement
New Auto-Interp
Negative Logits
chat
-0.89
quer
-0.77
rub
-0.72
©¶æ¥µ
-0.68
chen
-0.68
alk
-0.68
gging
-0.67
ond
-0.66
Chat
-0.66
oiler
-0.65
POSITIVE LOGITS
regard
0.99
regards
0.96
respect
0.89
expectations
0.78
tradition
0.78
lihood
0.74
impunity
0.74
rium
0.73
ideals
0.72
precaution
0.72
Activations Density 0.074%