INDEX
Explanations
reflections and self-perception in various contexts
New Auto-Interp
Negative Logits
martyr
-0.17
Municipal
-0.15
roj
-0.15
theid
-0.15
uds
-0.14
.asc
-0.14
ufs
-0.14
osate
-0.13
sst
-0.13
acock
-0.13
POSITIVE LOGITS
mirror
0.80
mirrors
0.76
Mirror
0.68
mirror
0.68
Mirror
0.63
Mir
0.58
reflection
0.58
éķľ
0.56
mirrored
0.56
mir
0.56
Activations Density 0.064%