INDEX
Explanations
statements involving statistics and social commentary
New Auto-Interp
Negative Logits
...
-0.55
...↵
-0.53
...↵↵
-0.50
)...
-0.42
...↵
-0.39
...
-0.38
......
-0.36
...(
-0.36
..."↵
-0.36
...,
-0.36
POSITIVE LOGITS
–
0.32
.–
0.32
–
0.31
ðŁĻĤ
0.31
–↵
0.31
â̦.
0.30
ðŁĻĤ↵↵
0.30
â̦â̦â̦â̦
0.29
â̦â̦â̦â̦â̦â̦â̦â̦
0.29
–↵↵
0.28
Activations Density 1.009%