INDEX
Explanations
references to societal conditioning and manipulation
New Auto-Interp
Negative Logits
ongan
-0.17
Bobby
-0.17
idges
-0.16
üs
-0.15
idenav
-0.15
fon
-0.15
saf
-0.14
unn
-0.14
benef
-0.14
Forms
-0.14
POSITIVE LOGITS
PELL
0.17
aven
0.15
kola
0.14
ippi
0.14
ofs
0.14
à¸ķลà¸Ńà¸Ķ
0.13
Bilim
0.13
füg
0.13
Shar
0.13
.circular
0.13
Activations Density 0.164%