INDEX
    Explanations

    predictions

    The neuron detects language describing a model producing predictions or decisions.

    New Auto-Interp
    Negative Logits
     ortaya
    -0.06
     Yus
    -0.06
    ALL
    -0.06
     THPT
    -0.06
    all
    -0.06
    .ONE
    -0.06
    -0.06
    One
    -0.06
    Method
    -0.05
    editary
    -0.05
    POSITIVE LOGITS
    UIImage
    0.08
    .ask
    0.07
    rač
    0.07
     fresh
    0.07
     preds
    0.07
     пока
    0.07
     जन
    0.07
     fetisch
    0.07
     Invite
    0.07
     achievements
    0.06
    Act Density 0.013%

    No Known Activations