INDEX
Explanations
. followed by a word/number
New Auto-Interp
Negative Logits
kembangan
-1.02
masalahan
-1.00
whose
-0.79
峒
-0.75
braz
-0.75
although
-0.74
kauft
-0.73
meniz
-0.73
个数
-0.71
-0.71
POSITIVE LOGITS
Technik
0.96
Kako
0.90
berikut
0.90
飭
0.90
sopra
0.85
actitudes
0.85
']=
0.85
ppure
0.84
墩
0.84
entlichen
0.81
Activations Density 0.001%