INDEX
Explanations
expressions of self-awareness and identity
New Auto-Interp
Negative Logits
ules
-0.17
adle
-0.17
ÑĢави
-0.15
tail
-0.14
forces
-0.14
æķ¬
-0.13
Rouge
-0.13
ãĥĵãĥ¼
-0.13
Tail
-0.13
intermediate
-0.13
POSITIVE LOGITS
Workbook
0.21
ego
0.21
Christ
0.17
Projection
0.16
hol
0.16
Perception
0.16
brothers
0.16
compileComponents
0.16
projection
0.16
Reality
0.16
Activations Density 0.002%