INDEX
    Explanations

    attends to digit tokens from numerical token indices

    New Auto-Interp
    Head Attr Weights
    0:0.15
    1:0.31
    2:0.17
    3:0.06
    4:0.08
    5:0.06
    6:0.06
    7:0.07
    Negative Logits
    帖最后由
    -0.31
     okuyayım
    -0.26
    IsContent
    -0.24
    ID
    -0.23
    Datuak
    -0.23
     indiv
    -0.23
    ('/:
    -0.23
    AccessorTable
    -0.22
    Mohammed
    -0.22
     Full
    -0.22
    POSITIVE LOGITS
    يكب
    0.39
    hyrchwyd
    0.39
    zeitig
    0.37
    issory
    0.37
     ujednoznacz
    0.36
    tomation
    0.35
    viders
    0.35
    Enllaços
    0.34
    opedic
    0.33
     Petru
    0.33
    Act Density 0.380%

    No Known Activations