INDEX
Explanations
references to the Joe Rogan podcast and associated discussions
New Auto-Interp
Negative Logits
hoot
-0.17
iegel
-0.16
Shah
-0.15
prompt
-0.14
ESH
-0.14
etÃŃ
-0.14
Fran
-0.14
æľĹ
-0.14
builtin
-0.14
olan
-0.14
POSITIVE LOGITS
nonnull
0.18
iges
0.15
ancestor
0.15
vik
0.14
èĸ
0.14
vect
0.14
_EXTERNAL
0.13
flick
0.13
Zah
0.13
societal
0.13
Activations Density 0.002%