INDEX
Explanations
greetings or welcome messages in text
references to groups of people in a conversational context
New Auto-Interp
Negative Logits
territ
-0.69
sole
-0.69
adem
-0.57
Ukrain
-0.56
mination
-0.54
tenant
-0.53
ourses
-0.53
uphem
-0.52
Dialog
-0.52
occupies
-0.52
POSITIVE LOGITS
!
0.90
:)
0.81
!!!!
0.80
opausal
0.79
!!
0.79
!!!
0.78
alike
0.77
!:
0.77
ðŁĻĤ
0.75
:-)
0.75
Activations Density 0.079%