INDEX
Explanations
mentions of specific individuals or organizations
New Auto-Interp
Negative Logits
-
-0.24
-↵
-0.22
âĢŀ
-0.21
--
-0.21
--↵
-0.21
..
-0.20
"'
-0.19
..↵
-0.19
»
-0.19
â
-0.19
POSITIVE LOGITS
America
0.19
‘
0.18
’util
0.17
America
0.17
,’
0.15
orama
0.15
’ÑĶ
0.15
.’
0.15
!’
0.14
’s
0.14
Activations Density 0.003%