INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     was
    0.88
     for
    0.72
     كان
    0.70
    0.68
     ك
    0.66
     modelos
    0.63
    0.63
     alimentos
    0.63
     at
    0.62
     landen
    0.62
    POSITIVE LOGITS
    '
    0.78
    för
    0.67
    l
    0.66
    ↵↵
    0.65
    orb
    0.63
    fahrt
    0.61
    action
    0.59
    hr
    0.59
    lıkla
    0.59
    единен
    0.58
    Act Density 0.269%

    No Known Activations