INDEX
Explanations
words related to concepts of awareness and understanding
New Auto-Interp
Negative Logits
agli
-0.15
agem
-0.15
ãĥĶãĥ¼
-0.15
Alta
-0.14
yla
-0.14
coli
-0.14
mentions
-0.14
cket
-0.13
ensch
-0.13
paren
-0.13
POSITIVE LOGITS
Ign
0.24
oring
0.23
acio
0.23
ition
0.22
orer
0.22
ite
0.21
eous
0.20
azio
0.20
ored
0.20
orable
0.20
Activations Density 0.017%