INDEX
Explanations
phrases that indicate societal and environmental issues
New Auto-Interp
Negative Logits
actionDate
-0.15
åζ
-0.14
aven
-0.14
Ñĥма
-0.14
446
-0.14
olt
-0.14
avery
-0.14
iggins
-0.13
åζ
-0.13
ilar
-0.13
POSITIVE LOGITS
‘
0.24
'
0.23
“
0.22
"
0.20
«
0.18
dreaded
0.16
`
0.16
â
0.15
hidden
0.15
\"
0.15
Activations Density 0.255%