INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     doros
    -0.66
     חיצוניים
    -0.62
    ===============
    -0.55
     ogóle
    -0.55
    uddles
    -0.55
     ControllerBase
    -0.54
    Whenever
    -0.53
    Océ
    -0.53
    getMinutes
    -0.53
    ++++++++++++++++
    -0.53
    POSITIVE LOGITS
     design
    1.51
     Design
    1.48
    Design
    1.39
     designs
    1.37
    design
    1.29
     Designs
    1.28
    Designs
    1.19
     DESIGN
    1.16
    DESIGN
    1.16
    designs
    1.13
    Act Density 0.019%

    No Known Activations