INDEX
    Explanations

    proper nouns or specific names mentioned in the text

    New Auto-Interp
    Negative Logits
     Efq
    -0.89
     myſelf
    -0.88
     itſelf
    -0.86
     Italijanski
    -0.86
     houſe
    -0.84
     صوتيه
    -0.83
     Monfieur
    -0.82
    UnsafeEnabled
    -0.81
    PreferredItem
    -0.81
     doubtnut
    -0.81
    POSITIVE LOGITS
     El
    0.49
     en
    0.48
     ID
    0.47
     H
    0.47
    0.46
     al
    0.46
     Ra
    0.45
     i
    0.45
    ra
    0.43
     r
    0.43
    Act Density 0.193%

    No Known Activations