INDEX
Explanations
references to systemic concepts or ideas, particularly related to societal structures or problems
New Auto-Interp
Negative Logits
ff
-0.16
uda
-0.14
childs
-0.14
es
-0.13
rit
-0.13
enus
-0.13
rou
-0.13
ami
-0.13
vr
-0.13
ë²Ķ
-0.13
POSITIVE LOGITS
ediator
0.15
roker
0.15
ILT
0.14
αÏħÏĦÏĮ
0.14
afx
0.14
_globals
0.14
ÙĨاÙħÙĩ
0.14
Truthy
0.14
ipse
0.13
Sez
0.13
Activations Density 0.164%