INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    ur
    0.65
    inh
    0.63
    are
    0.63
    ן
    0.62
    ge
    0.61
    le
    0.60
    он
    0.60
    е
    0.57
    iling
    0.55
     রয়ে
    0.55
    POSITIVE LOGITS
     flow
    1.48
     Flow
    1.38
    Flow
    1.37
     flows
    1.20
    FLOW
    1.18
     FLOW
    1.16
    flow
    1.13
     flusso
    1.12
    1.12
     flujo
    1.10
    Act Density 0.112%

    No Known Activations