INDEX
    Explanations

    references to individuals and their roles or contributions within a context

    New Auto-Interp
    Negative Logits
    g
    -1.20
     g
    -1.10
    -0.74
    gens
    -0.67
    𝑔
    -0.67
    ging
    -0.64
    ged
    -0.63
    gen
    -0.63
    gating
    -0.61
    gha
    -0.61
    POSITIVE LOGITS
     Գ
    1.02
     Г
    1.00
     GG
    0.99
     G
    0.98
     Gu
    0.94
     GC
    0.89
     Gi
    0.88
     GV
    0.88
     GX
    0.87
     GF
    0.87
    Act Density 0.955%

    No Known Activations