INDEX
Explanations
phrases related to participating in events or activities
references to political parties or affiliations
New Auto-Interp
Negative Logits
ħĭ
-0.94
¥µ
-0.80
éĹĺ
-0.77
OME
-0.77
hirt
-0.76
hower
-0.76
æ©Ł
-0.74
doms
-0.74
士
-0.73
;;;;;;;;;;;;
-0.71
POSITIVE LOGITS
rot
0.96
amount
0.89
agraph
0.88
allel
0.86
rots
0.83
anoia
0.79
aged
0.79
ILCS
0.79
ret
0.78
icularly
0.78
Activations Density 0.009%