INDEX
Explanations
words related to mandates or policies that are one-size-fits-all
New Auto-Interp
Negative Logits
to
-0.60
%%
-0.59
diction
-0.55
Nanto
-0.55
paper
-0.54
the
-0.52
tot
-0.51
IQ
-0.50
Wikipedia
-0.50
ped
-0.50
POSITIVE LOGITS
0.78
č
0.78
À
0.78
rawdownload
0.77
ü
0.77
Ă
0.77
ę
0.77
ė
0.77
ć
0.77
Ě
0.77
Activations Density 0.687%