INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    s
    1.48
    in
    1.30
     in
    1.26
    1.12
    v
    1.11
    1
    1.10
    ng
    1.03
    w
    1.00
    pre
    0.96
    mu
    0.96
    POSITIVE LOGITS
    ר
    1.05
     functionals
    0.96
    λούν
    0.94
     functional
    0.92
     additives
    0.91
     fenders
    0.90
     feminists
    0.89
     adversaries
    0.88
    ில்
    0.86
     Functional
    0.85
    Act Density 0.025%

    No Known Activations