INDEX
Explanations
words related to legal matters and political activism
New Auto-Interp
Negative Logits
ifter
-0.61
adish
-0.61
Jou
-0.60
asp
-0.56
fare
-0.54
owler
-0.53
oufl
-0.53
stall
-0.52
dar
-0.51
asta
-0.51
POSITIVE LOGITS
thereto
1.38
to
1.08
entious
0.89
To
0.89
unto
0.84
ences
0.80
to
0.78
TO
0.76
itiz
0.76
pires
0.76
Activations Density 1.638%