INDEX
Explanations
concepts related to abstract ideas and theories
New Auto-Interp
Negative Logits
unami
-0.16
uggy
-0.15
DOG
-0.15
ÅĽÄĩ
-0.15
age
-0.15
houses
-0.14
ager
-0.14
aments
-0.14
isen
-0.14
iesen
-0.14
POSITIVE LOGITS
ively
0.23
ually
0.22
concepts
0.17
ual
0.17
Concepts
0.16
strup
0.16
734
0.15
718
0.15
drv
0.15
ecure
0.15
Activations Density 0.053%