INDEX
Explanations
phrases related to social interactions and group activities
New Auto-Interp
Negative Logits
â̦↵
-0.28
â̦and
-0.25
â̦”
-0.24
â̦
-0.23
â̦.
-0.22
â̦↵
-0.22
â̦I
-0.21
â̦the
-0.21
â̦but
-0.21
â̦â̦
-0.21
POSITIVE LOGITS
#af
0.16
#ab
0.16
#ac
0.16
#ad
0.15
-*-č↵
0.15
)frame
0.14
)application
0.14
#aa
0.14
@js
0.14
/******/
0.13
Activations Density 98.153%