INDEX
Explanations
the presence of quotation marks or apostrophes in the text
New Auto-Interp
Negative Logits
\""
-0.95
}}"
-0.89
osoba
-0.85
“
-0.85
Menge
-0.84
{}".-0.84
(",")-0.84
"}
-0.84
}".
-0.81
Peque
-0.80
POSITIVE LOGITS
!='
1.08
'
1.05
Ndr
1.03
]='\
0.95
='
0.95
=’
0.94
('0.94
==='
0.93
>';
0.93
'.
0.92
Activations Density 0.247%