INDEX
    Explanations

    references to rabbits and related diseases

    New Auto-Interp
    Negative Logits
    anske
    -0.16
    atro
    -0.16
    wnd
    -0.16
     fish
    -0.15
    zk
    -0.15
    thane
    -0.15
    nicos
    -0.15
    ấu
    -0.15
     dog
    -0.14
    gars
    -0.14
    POSITIVE LOGITS
     rabbit
    0.51
     rabbits
    0.48
     Rabbit
    0.46
    rabbit
    0.46
     Rab
    0.43
     bunny
    0.41
     Bunny
    0.38
    .rabbit
    0.36
    rab
    0.32
     hare
    0.28
    Act Density 0.012%

    No Known Activations