INDEX
    Explanations

    instances of URLs and mentions of specific characters or their actions

    New Auto-Interp
    Negative Logits
     Fucking
    -0.22
     fucking
    -0.21
     fucked
    -0.19
    fuck
    -0.19
     shit
    -0.18
     fuck
    -0.18
     fucks
    -0.16
     bullshit
    -0.16
    _MACRO
    -0.16
     FUCK
    -0.15
    POSITIVE LOGITS
     Dil
    0.44
     dil
    0.35
     Dog
    0.26
    Dog
    0.24
     dilation
    0.23
     Rat
    0.21
     diluted
    0.21
     dog
    0.20
    dog
    0.19
     Boss
    0.18
    Act Density 0.004%

    No Known Activations