INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     AssemblyCulture
    -0.78
    saraba
    -0.69
    LookAnd
    -0.68
    ]]);
    -0.64
     Италијани
    -0.62
     zelfs
    -0.62
    InvalidProtocol
    -0.61
    -0.61
    ArrowToggle
    -0.59
    ]));
    
    -0.59
    POSITIVE LOGITS
    Things
    0.57
     MainAxisSize
    0.52
     Things
    0.52
     things
    0.52
     something
    0.50
    dep
    0.50
    out
    0.50
    me
    0.49
    who
    0.49
     who
    0.47
    Act Density 0.026%

    No Known Activations