INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     كومونز
    -0.59
    iron
    -0.50
     אֲ
    -0.47
    localPosition
    -0.47
     iron
    -0.46
     برابوك
    -0.45
    Sasha
    -0.45
    transition
    -0.45
    Iron
    -0.44
    zeits
    -0.44
    POSITIVE LOGITS
     Rabbits
    1.36
     rabbit
    1.23
     Rabbit
    1.15
     rabbits
    1.15
     RAB
    1.14
     bunny
    1.11
     bunnies
    1.08
    rabbit
    1.07
     inev
    1.04
     thut
    1.04
    Act Density 0.256%

    No Known Activations