INDEX
Explanations
references to social reform and progressive policies
New Auto-Interp
Negative Logits
anta
-0.15
avana
-0.15
æµľ
-0.14
stats
-0.14
ãĤ·ãĥ¼
-0.14
arker
-0.14
edException
-0.13
/stats
-0.13
.son
-0.13
è£ı
-0.13
POSITIVE LOGITS
universal
0.22
universal
0.19
Universal
0.19
Universal
0.17
univers
0.16
iversal
0.16
Transportation
0.16
ewe
0.16
UNIVERS
0.16
repar
0.15
Activations Density 0.081%