INDEX
Explanations
mentions of negative events or allegations
dialogue or quotes from various speakers in a text
New Auto-Interp
Negative Logits
¬¼
-0.82
²¾
-0.80
ħĭ
-0.80
etheless
-0.79
ĻĤ
-0.69
anmar
-0.69
©¶æ
-0.68
ļéĨĴ
-0.65
ĪĴ
-0.64
Ĭ
-0.63
POSITIVE LOGITS
writes
1.28
wrote
1.22
reads
1.18
explains
1.13
recalls
1.12
according
1.05
says
1.05
recalled
1.05
observes
1.04
explained
1.03
Activations Density 0.101%