INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     rotation
    -1.61
     Rotation
    -1.59
     rotate
    -1.58
     rotated
    -1.49
     rotates
    -1.47
     Rotate
    -1.38
    rotation
    -1.33
     rotations
    -1.32
     rotating
    -1.25
     Rotating
    -1.18
    POSITIVE LOGITS
    el
    0.74
     of
    0.70
    i
    0.68
    in
    0.66
     in
    0.65
    y
    0.65
    es
    0.65
    ec
    0.64
    ly
    0.64
    elle
    0.63
    Act Density 0.081%

    No Known Activations