INDEX
    Explanations

    phrases related to instructions and recommendations for tasks

    New Auto-Interp
    Negative Logits
    obel
    -0.15
    elman
    -0.15
    Comb
    -0.15
    indsight
    -0.14
    DK
    -0.14
    cke
    -0.14
     Jimmy
    -0.14
    enin
    -0.14
    697
    -0.13
    agal
    -0.13
    POSITIVE LOGITS
     ÑĤÑĢа
    0.15
    cona
    0.15
    rawl
    0.15
    ndx
    0.14
    ites
    0.14
    ahat
    0.14
    yll
    0.14
    gos
    0.13
    ols
    0.13
    eties
    0.13
    Act Density 0.032%

    No Known Activations