INDEX
Explanations
the presence of personal statements and opinions
New Auto-Interp
Negative Logits
Efq
-1.07
itſelf
-1.06
antMatchers
-1.03
Houſe
-1.02
myſelf
-1.01
becauſe
-1.00
raiſ
-1.00
་་
-0.98
存于互联网档案馆
-0.97
houſe
-0.97
POSITIVE LOGITS
↵↵
0.76
The
0.64
↵
0.63
.
0.60
0.59
(
0.58
and
0.57
The
0.57
,
0.56
As
0.52
Activations Density 0.462%