INDEX
Explanations
self-related terms and concepts
references to self-identity and self-awareness
New Auto-Interp
Negative Logits
GOODMAN
-0.75
Amend
-0.74
Nights
-0.73
Slay
-0.72
Ashe
-0.71
Horizon
-0.71
Orchestra
-0.70
Vid
-0.69
Leap
-0.68
IUM
-0.68
POSITIVE LOGITS
destruct
1.20
destruct
1.04
conscious
0.96
upload
0.93
explanatory
0.91
same
0.82
esteem
0.80
lessly
0.78
calcul
0.77
contained
0.77
Activations Density 0.016%