INDEX
Explanations
instances of people and their actions or states of being
New Auto-Interp
Negative Logits
æĭ
-0.17
reinterpret
-0.15
asil
-0.15
izza
-0.15
enant
-0.15
angl
-0.15
ennon
-0.14
ombs
-0.14
.pp
-0.14
aware
-0.13
POSITIVE LOGITS
seen
0.36
seen
0.33
Seen
0.33
heard
0.32
heard
0.32
spotted
0.31
Seen
0.31
Heard
0.28
_seen
0.25
observed
0.24
Activations Density 0.037%