INDEX
    Explanations

    pre-installed, enforced, fundamental

    New Auto-Interp
    Negative Logits
    2
    0.51
    3
    0.48
    6
    0.44
    raw
    0.43
    7
    0.43
     -
    0.42
     commonplace
    0.42
     either
    0.41
     at
    0.40
     پیر
    0.40
    POSITIVE LOGITS
     الْم
    0.41
    čky
    0.40
     Majid
    0.40
     Dyson
    0.40
     भूमि
    0.38
     त्यात
    0.38
     و
    0.38
     krijgen
    0.38
     అంశ
    0.38
     viktigt
    0.38
    Act Density 0.001%

    No Known Activations