INDEX
    Explanations

    flow and richness of concepts

    New Auto-Interp
    Negative Logits
    are
    0.57
    edor
    0.54
    geç
    0.54
    å
    0.53
    afstand
    0.52
    penup
    0.52
    ane
    0.50
    ighter
    0.50
    പ്പി
    0.50
    edil
    0.50
    POSITIVE LOGITS
     flow
    0.98
     overflow
    0.93
     Flow
    0.89
     Overflow
    0.87
     overflowing
    0.87
     flows
    0.82
     flujo
    0.82
     поток
    0.82
    Flow
    0.81
     overflows
    0.81
    Act Density 0.431%

    No Known Activations