INDEX
Explanations
negations or forms of denial in the text
New Auto-Interp
Negative Logits
orp
-0.16
ndern
-0.16
avor
-0.15
avou
-0.15
kiem
-0.14
criptor
-0.14
hiba
-0.14
ãĤŃãĥ³ãĤ°
-0.14
thon
-0.14
Tan
-0.14
POSITIVE LOGITS
thì
0.22
çļĦè¯Ŀ
0.22
then
0.18
then
0.16
THEN
0.16
åĪĻ
0.15
ï¼ĮåĪĻ
0.15
fte
0.15
Hak
0.15
olas
0.15
Activations Density 0.072%