INDEX
Explanations
questions, thanks, and informal remarks
New Auto-Interp
Negative Logits
💀
1.18
:/
1.13
xD
1.08
fucking
1.02
✨
1.01
fucked
1.00
idk
0.99
fuck
0.96
👾
0.95
⚠️
0.93
POSITIVE LOGITS
Incidentally
1.21
Incidentally
1.16
Guess
1.08
Wouldn
1.04
incidentally
0.97
guess
0.96
Gee
0.96
Believe
0.95
Guess
0.94
terrific
0.93
Activations Density 0.027%