INDEX
Explanations
phrases related to legal procedures, laws, and political discussions
references to legal and medical topics
New Auto-Interp
Negative Logits
ãĢ
-0.80
âĸº
-0.74
ðŁij
-0.70
âĢ
-0.69
@#&
-0.68
ãĢIJ
-0.67
âĶĤ
-0.66
Âł Âł Âł Âł Âł Âł Âł Âł
-0.65
âĸº
-0.64
Æ
-0.64
POSITIVE LOGITS
themselves
0.96
merely
0.88
outright
0.84
selves
0.83
spontaneously
0.76
respectively
0.76
inadvertently
0.76
altogether
0.75
falsely
0.75
selves
0.75
Activations Density 0.820%