INDEX
    Explanations

    actions related to movement or direction

    New Auto-Interp
    Negative Logits
    ont
    -0.15
    kt
    -0.15
    t
    -0.15
    asm
    -0.14
     Occ
    -0.14
    ål
    -0.14
    abez
    -0.14
    vas
    -0.13
    ly
    -0.13
    aling
    -0.13
    POSITIVE LOGITS
    chwitz
    0.17
     orth
    0.15
    ãģ
    0.15
     Predictor
    0.15
    ORTH
    0.14
    ynn
    0.13
    phia
    0.13
    osten
    0.13
    adc
    0.13
    inh
    0.13
    Act Density 0.004%

    No Known Activations