INDEX
Explanations
phrases indicating collaboration or teamwork in various contexts
New Auto-Interp
Negative Logits
rong
-0.18
omid
-0.17
enthal
-0.16
rosso
-0.16
uo
-0.15
ould
-0.15
ooks
-0.14
isay
-0.14
abb
-0.14
iek
-0.14
POSITIVE LOGITS
odox
0.15
kali
0.15
074
0.14
Doe
0.14
)this
0.14
CP
0.13
901
0.13
conven
0.13
Principal
0.13
Burgess
0.13
Activations Density 0.222%