INDEX
Explanations
punctuation marks in the text
New Auto-Interp
Negative Logits
ãĤ©
-0.22
ãģĦãģŁ
-0.20
ed
-0.19
————————————————
-0.18
alled
-0.18
don
-0.18
nhau
-0.17
fold
-0.17
des
-0.16
ะ
-0.16
POSITIVE LOGITS
ร
0.24
istics
0.19
wner
0.18
ãģĤãģ£ãģŁ
0.17
ãģĤãĤĬ
0.17
istic
0.17
ughter
0.17
ÌĨ
0.17
ment
0.17
ughters
0.17
Activations Density 0.205%