INDEX
Explanations
words related to progress or the potential for future development
New Auto-Interp
Negative Logits
ãĥ¬
-0.14
yer
-0.14
SSERT
-0.13
Babe
-0.13
sic
-0.13
·
-0.13
uber
-0.12
probe
-0.12
rd
-0.12
cky
-0.12
POSITIVE LOGITS
imity
0.16
ucer
0.15
utenberg
0.14
Watt
0.14
.nano
0.14
986
0.14
ivos
0.13
utation
0.13
rosso
0.13
ulumi
0.13
Activations Density 0.031%