INDEX
Explanations
phrases that indicate warnings or calls to action regarding societal or systemic issues
New Auto-Interp
Negative Logits
},'
-0.15
един
-0.15
gregar
-0.14
iri
-0.14
Welch
-0.14
ENU
-0.14
ohn
-0.13
ered
-0.13
æ£
-0.13
ëĭ´
-0.13
POSITIVE LOGITS
roe
0.16
Ramp
0.15
scale
0.15
bod
0.14
Bod
0.14
Scale
0.14
Dispatch
0.14
Alternate
0.14
scale
0.14
än
0.14
Activations Density 0.261%