INDEX
Explanations
punctuation marks, particularly at the end of phrases
New Auto-Interp
Negative Logits
/
-0.50
Er
-0.50
I
-0.50
LayoutStyle
-0.48
scheme
-0.48
trans
-0.48
伝わ
-0.47
thứ
-0.47
or
-0.47
للا
-0.46
POSITIVE LOGITS
")
1.82
?")
1.73
!")
1.64
.")
1.63
'")
1.62
?')
1.61
')
1.60
."]
1.58
%")
1.58
!')
1.53
Activations Density 0.135%