INDEX
Explanations
expressions of self-reflection and introspection
New Auto-Interp
Negative Logits
513
-0.17
owski
-0.16
514
-0.15
borg
-0.15
.biz
-0.14
yy
-0.14
946
-0.14
.lu
-0.14
ajo
-0.13
aits
-0.13
POSITIVE LOGITS
ubo
0.16
ngen
0.15
ecera
0.15
ģm
0.15
uhan
0.15
Prospect
0.14
idon
0.14
ubi
0.14
bole
0.14
Branch
0.14
Activations Density 0.187%