INDEX
    Explanations

    references to numeric sections in documents

    New Auto-Interp
    Negative Logits
    3
    -0.16
    av
    -0.16
    0
    -0.16
     hypers
    -0.16
    750
    -0.15
    2
    -0.15
    flt
    -0.15
    nt
    -0.15
    ous
    -0.15
     kost
    -0.14
    POSITIVE LOGITS
    naires
    0.20
    naire
    0.19
    iu
    0.19
    hc
    0.16
    že
    0.16
    ally
    0.15
    plots
    0.15
    hx
    0.15
    ariat
    0.14
    itto
    0.14
    Act Density 0.030%

    No Known Activations