INDEX
Explanations
references to governmental and organizational structures
New Auto-Interp
Negative Logits
زر
-0.14
ibold
-0.14
zek
-0.14
anky
-0.14
ournal
-0.14
ZH
-0.14
legate
-0.13
ORMAL
-0.13
porr
-0.13
hend
-0.13
POSITIVE LOGITS
_mE
0.18
_tE
0.16
itt
0.16
ardo
0.15
__*/
0.14
_mB
0.14
_mD
0.14
ida
0.14
aat
0.14
#
0.13
Activations Density 0.372%