INDEX
Explanations
statements related to opinions or declarations made by individuals
New Auto-Interp
Negative Logits
»,
-1.26
?",
-1.26
!",
-1.25
。」
-1.23
"],
-1.23
.",
-1.21
"),
-1.21
」,
-1.20
」
-1.19
"،
-1.19
POSITIVE LOGITS
“
2.03
"
1.57
''
1.07
``
1.06
“...
0.95
0.69
...
0.61
“[
0.60
“¿
0.59
‘‘
0.58
Activations Density 0.267%