INDEX
Explanations
phrases related to misunderstandings or disagreements in social interactions
expressions of confusion or frustration about a situation
New Auto-Interp
Negative Logits
ÂŃ
-1.11
âĢij
-1.06
—
-0.98
Footnote
-0.87
®,
-0.79
Thirty
-0.76
Enlarge
-0.76
–
-0.75
"—
-0.73
)—
-0.72
POSITIVE LOGITS
didnt
1.71
doesnt
1.69
dont
1.68
alot
1.47
lol
1.35
english
1.35
tho
1.34
nt
1.29
dmg
1.17
wont
1.17
Activations Density 1.078%