INDEX
Explanations
words related to general commentary or opinions
references to group dynamics and collective actions or opinions
New Auto-Interp
Negative Logits
"},"
-0.56
namely
-0.54
viz
-0.50
').
-0.49
'.
-0.47
Firstly
-0.43
."
-0.42
':
-0.42
".[
-0.42
Whilst
-0.37
POSITIVE LOGITS
,
1.04
?,
0.81
!,
0.81
,,
0.81
,...
0.79
.,
0.78
,[
0.73
*,
0.72
+,
0.70
,-
0.70
Activations Density 3.160%