INDEX
Explanations
social gatherings or events
New Auto-Interp
Negative Logits
ãĥį
-0.57
ËĪ
-0.56
confir
-0.54
é¾į
-0.54
counterpart
-0.53
Nich
-0.50
similarly
-0.50
iman
-0.49
noon
-0.49
Reviewer
-0.49
POSITIVE LOGITS
etc
1.60
etc
1.17
â̦
0.93
â̦)
0.92
,...
0.92
ect
0.91
â̦.
0.89
,
0.88
â̦
0.86
...)
0.83
Activations Density 0.392%