INDEX
    Explanations

    elements related to humorous or playful situations

    New Auto-Interp
    Negative Logits
     bench
    -0.20
     Bench
    -0.19
    bol
    -0.18
    jom
    -0.17
     Lincoln
    -0.17
    inston
    -0.16
    _unix
    -0.16
     LIN
    -0.15
    bench
    -0.15
    487
    -0.15
    POSITIVE LOGITS
     Brian
    1.23
    Brian
    1.12
     Brain
    0.59
     β
    0.55
     bean
    0.51
    Brain
    0.51
     brain
    0.49
     beta
    0.48
     Bean
    0.47
    rian
    0.47
    Act Density 0.015%

    No Known Activations