INDEX
    Explanations

    human intelligence and language

    New Auto-Interp
    Negative Logits
    l
    0.97
    to
    0.91
    v
    0.91
    j
    0.86
    of
    0.84
    g
    0.82
    ll
    0.79
    is
    0.77
    p
    0.77
    ك
    0.75
    POSITIVE LOGITS
    5
    0.98
    ;
    0.94
    )
    0.80
     at
    0.77
     human
    0.74
     for
    0.71
     pharmacist
    0.69
    .
    0.68
    ]
    0.68
    ר
    0.68
    Act Density 0.043%

    No Known Activations