INDEX
Negative Logits
âĶľ
-0.67
ãĤ¶
-0.64
vernment
-0.62
eworks
-0.62
aternity
-0.62
natureconservancy
-0.60
ullivan
-0.60
ucc
-0.60
ername
-0.59
sshd
-0.59
POSITIVE LOGITS
ting
0.98
ted
0.88
ned
0.80
ning
0.78
ishment
0.77
eful
0.73
ner
0.71
ners
0.71
ray
0.69
ingly
0.69
Activations Density 8.840%