INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     humain
    -0.08
    ත්ත
    -0.08
    _INSTANCE
    -0.08
    Pooling
    -0.08
    aird
    -0.08
     Buddha
    -0.08
     dieu
    -0.08
     inventions
    -0.08
    FUL
    -0.07
     prophet
    -0.07
    POSITIVE LOGITS
     Color
    0.08
     круг
    0.08
     establishes
    0.08
     hydrochlor
    0.08
    ermine
    0.08
     Hess
    0.07
    través
    0.07
     Lut
    0.07
    0.07
    reach
    0.07
    Act Density 0.001%

    No Known Activations