INDEX
Explanations
the presence of references to guidelines or official regulations
New Auto-Interp
Negative Logits
ë§ŀ
-0.15
ype
-0.15
abin
-0.15
iegel
-0.15
cente
-0.15
glas
-0.14
hk
-0.14
fak
-0.14
ä½į
-0.14
inati
-0.14
POSITIVE LOGITS
andy
0.16
namespace
0.15
cap
0.15
ardy
0.14
aked
0.14
gang
0.14
aque
0.14
neutral
0.14
Sale
0.13
ard
0.13
Activations Density 0.147%