INDEX
Explanations
significant statements or quotations
statements or descriptions related to performance or conduct
New Auto-Interp
Negative Logits
"â̦
-1.47
"â̦
-1.32
â̦"
-1.25
â̦
-1.21
–
-1.17
–
-1.08
â̳
-1.07
â̦)
-1.06
ðŁĻĤ
-1.02
â̦]
-0.86
POSITIVE LOGITS
�
2.51
''
2.43
�
2.41
.''
2.39
''
2.29
``
2.18
``
2.17
,''
2.16
.�
2.15
''.
2.07
Activations Density 0.070%