INDEX
Explanations
instances of words indicating permission or consent
New Auto-Interp
Negative Logits
ész
-0.15
anje
-0.15
odash
-0.14
erin
-0.14
ارÙĩ
-0.14
asurer
-0.14
ulares
-0.14
/Area
-0.13
.LayoutStyle
-0.13
ngược
-0.13
POSITIVE LOGITS
alone
1.85
alone
1.56
Alone
1.50
-alone
1.19
solo
1.01
Solo
0.84
seul
0.83
Solo
0.83
lone
0.81
seule
0.78
Activations Density 0.393%