INDEX
    Explanations

    references to rabbits and bunny-related terms

    instances of the words "rabbit" and "bunny."

    New Auto-Interp
    Negative Logits
    omething
    -0.93
    ician
    -0.85
    igmat
    -0.84
    inia
    -0.78
    rylic
    -0.78
    orie
    -0.77
    itutional
    -0.76
    ructure
    -0.75
    xit
    -0.73
    inen
    -0.73
    POSITIVE LOGITS
    MQ
    1.18
     Hole
    0.96
     rabbit
    0.88
    meat
    0.86
     Rabbit
    0.85
     Wilde
    0.82
     Nest
    0.79
     rabbits
    0.76
     Hunt
    0.72
    wald
    0.69
    Act Density 0.021%

    No Known Activations