INDEX
Explanations
the presence of specific temporal or location phrases within the text
New Auto-Interp
Negative Logits
kah
-0.17
oretical
-0.17
gether
-0.16
orem
-0.16
allee
-0.15
oret
-0.15
venge
-0.15
aring
-0.15
withstanding
-0.14
è²´
-0.14
POSITIVE LOGITS
shore
0.17
obot
0.16
quis
0.15
sha
0.15
order
0.15
ÙĦت
0.15
unc
0.14
col
0.14
Batt
0.14
iqu
0.14
Activations Density 1.363%