INDEX
Explanations
references to concepts of awareness and identity
New Auto-Interp
Negative Logits
-0.15
(...
-0.15
andr
-0.15
bÃŃr
-0.14
ĵåIJį
-0.14
ÑĢав
-0.13
endoza
-0.13
oyer
-0.13
controlId
-0.13
estruct
-0.13
POSITIVE LOGITS
slave
0.21
Slave
0.19
slaves
0.19
slave
0.19
Slave
0.18
Gore
0.17
tarn
0.16
arlar
0.15
Kur
0.15
sandals
0.15
Activations Density 0.004%