INDEX
    Explanations

    physical attributes and actions

    New Auto-Interp
    Negative Logits
    西
    0.47
    0.46
    ​​
    0.45
    0.45
    0.44
    Provide
    0.41
    ال
    0.40
     Katal
    0.40
    0.39
    ە
    0.39
    POSITIVE LOGITS
     líquido
    0.48
     graft
    0.48
     cukup
    0.47
     carbono
    0.46
     odu
    0.46
     aprend
    0.45
     gespre
    0.45
    સિંહ
    0.45
    0.45
     happ
    0.44
    Act Density 0.002%

    No Known Activations