INDEX
Explanations
references to auditory perception or experiences involving listening
New Auto-Interp
Negative Logits
hread
-0.18
å§¿
-0.17
gor
-0.15
scene
-0.15
owa
-0.15
chin
-0.14
tems
-0.14
971
-0.14
ced
-0.14
"text
-0.14
POSITIVE LOGITS
kening
0.23
/read
0.18
ald
0.18
/sm
0.17
about
0.17
wig
0.16
/watch
0.16
/view
0.15
_about
0.15
isay
0.15
Activations Density 0.026%