INDEX
Explanations
phrases indicating authority and accountability
New Auto-Interp
Negative Logits
BoxFit
-0.84
__*/
-0.77
Personendaten
-0.72
newOwner
-0.68
InputBorder
-0.68
__(/*!
-0.67
دانشنامهٔ
-0.67
Мексичка
-0.66
RectangleBorder
-0.66
findpost
-0.66
POSITIVE LOGITS
↵
0.27
ac
0.27
diss
0.26
<eos>
0.25
barang
0.24
特
0.24
bague
0.23
:
0.23
terper
0.23
bag
0.23
Activations Density 0.758%