INDEX
Explanations
questions related to seeking help or information
New Auto-Interp
Negative Logits
idlo
-0.16
bung
-0.15
одаÑĢ
-0.14
zh
-0.14
osi
-0.13
uese
-0.13
laughter
-0.13
æŃ©
-0.13
NOTHING
-0.13
åŀ
-0.12
POSITIVE LOGITS
anyone
0.41
Anyone
0.37
anybody
0.35
Anyone
0.33
help
0.30
HELP
0.28
Help
0.28
HELP
0.26
Need
0.24
help
0.23
Activations Density 0.051%