INDEX
    Explanations

    phrases indicating frequency or repetition in actions

    New Auto-Interp
    Negative Logits
    atis
    -0.19
    usi
    -0.16
    encoded
    -0.15
    suit
    -0.14
    aber
    -0.14
    etic
    -0.14
     olanlar
    -0.14
    íĥĪ
    -0.14
    Wire
    -0.13
    ivated
    -0.13
    POSITIVE LOGITS
    omics
    0.17
    üstü
    0.16
     dik
    0.15
    aira
    0.15
    ilty
    0.15
    DBC
    0.14
    ideos
    0.14
    anno
    0.14
    azu
    0.14
    jak
    0.13
    Act Density 0.018%

    No Known Activations