INDEX
Explanations
sentences or phrases expressing complex emotional or philosophical ideas
New Auto-Interp
Negative Logits
oj
-0.15
thew
-0.15
allon
-0.14
lein
-0.13
ooks
-0.13
Lehr
-0.13
eyh
-0.13
ove
-0.13
_fake
-0.13
iyas
-0.13
POSITIVE LOGITS
ugar
0.15
:
0.15
TM
0.14
?:
0.14
noDB
0.14
pestic
0.13
821
0.13
KP
0.13
Contributor
0.13
ugen
0.13
Activations Density 0.162%