INDEX
Explanations
phrases related to societal and political criticisms
New Auto-Interp
Negative Logits
orex
-0.17
повеÑĢÑħ
-0.15
oho
-0.15
ilst
-0.14
ÐĽÐ¬
-0.14
.throw
-0.14
earch
-0.14
velt
-0.14
ued
-0.14
ila
-0.13
POSITIVE LOGITS
apgolly
0.14
аÐ
0.14
Ïħγ
0.14
Authentic
0.13
åŁº
0.13
*)((
0.13
Ging
0.12
.='<
0.12
conclusions
0.12
Yuk
0.12
Activations Density 0.414%