INDEX
Explanations
phrases related to legal proceedings or events
instances of a specific character or symbol
New Auto-Interp
Negative Logits
monop
-0.79
chained
-0.73
shack
-0.70
mixed
-0.68
multiplying
-0.66
machine
-0.65
floating
-0.65
sor
-0.65
gum
-0.64
backdoor
-0.64
POSITIVE LOGITS
ĸļ
0.98
º
0.93
į
0.88
£
0.86
terness
0.83
cipline
0.81
®
0.81
¹
0.81
»
0.80
agree
0.80
Activations Density 0.210%