INDEX
Explanations
actions related to causing harm or exerting control over others
phrases related to criminal activities and their impacts
New Auto-Interp
Negative Logits
zai
-0.62
nai
-0.61
ftime
-0.59
hua
-0.59
nen
-0.57
atic
-0.56
icz
-0.55
onte
-0.54
ukong
-0.53
Accessed
-0.53
POSITIVE LOGITS
them
2.36
THEM
1.95
them
1.87
Them
1.72
ones
1.65
theirs
1.54
they
1.19
those
1.18
They
1.17
hers
1.09
Activations Density 2.114%