INDEX
Explanations
references to socio-political events and actions
New Auto-Interp
Negative Logits
ืà¹ī
-0.14
dzi
-0.14
idor
-0.14
Suche
-0.14
ÙĨÛĮÙĨ
-0.14
ITLE
-0.13
ikan
-0.13
кÑĥл
-0.13
Scoped
-0.13
igua
-0.13
POSITIVE LOGITS
how
0.28
why
0.28
inside
0.24
Why
0.22
How
0.22
Opinion
0.21
these
0.20
by
0.20
inside
0.20
Inside
0.19
Activations Density 0.176%