INDEX
Explanations
greetings or salutations in the text
New Auto-Interp
Negative Logits
arra
-0.16
nte
-0.15
åύ
-0.15
éĢļ
-0.14
οÏħÏĤ
-0.13
.tv
-0.13
furt
-0.13
parts
-0.13
ube
-0.13
ters
-0.13
POSITIVE LOGITS
ooo
0.26
everyone
0.24
oooo
0.23
oo
0.23
everybody
0.23
oooooooo
0.22
Everyone
0.22
Kitty
0.22
everyone
0.20
_world
0.20
Activations Density 0.018%