INDEX
Explanations
references to religious practices and principles
New Auto-Interp
Negative Logits
nữa
-0.50
所以
-0.45
ubi
-0.45
abili
-0.44
Pourtant
-0.43
حاد
-0.42
Deshalb
-0.40
Nope
-0.40
đâu
-0.40
Moreover
-0.39
POSITIVE LOGITS
using
1.56
ignoring
1.28
keeping
1.25
assuming
1.24
utilizando
1.23
utilizing
1.23
utilizzando
1.23
используя
1.22
USING
1.21
using
1.20
Activations Density 1.022%