INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    osity
    -0.79
    مصادر
    -0.70
    copes
    -0.70
    featureID
    -0.68
    opedia
    -0.65
    elstein
    -0.65
     autorytatywna
    -0.61
    oms
    -0.61
    umbered
    -0.60
    SharedDtor
    -0.59
    POSITIVE LOGITS
     Anſ
    0.80
     myſelf
    0.78
     himſelf
    0.75
     ſtate
    0.75
     itſelf
    0.74
     Theſe
    0.73
     themſelves
    0.70
     فريبيس
    0.69
     theſe
    0.68
    ſelf
    0.68
    Act Density 0.028%

    No Known Activations