INDEX
    Explanations

    roll and subsequent actions

    New Auto-Interp
    Negative Logits
    urnya
    0.49
    ua
    0.48
    metik
    0.48
    0.48
    ноу
    0.46
    organisms
    0.45
    larından
    0.45
    UTRAL
    0.44
    arial
    0.44
    matmul
    0.44
    POSITIVE LOGITS
     rolling
    1.52
     roll
    1.45
    Roll
    1.41
     rolled
    1.40
     Rolling
    1.40
     Roll
    1.35
    Rolling
    1.32
    ROLL
    1.24
     ROLL
    1.21
     rollout
    1.18
    Act Density 0.020%

    No Known Activations