INDEX
    Explanations

    numeric values or constants

    New Auto-Interp
    Negative Logits
    ']));
    -0.77
    ]})
    -0.67
    ')));
    -0.67
    )]);
    -0.67
    ']);
    
    -0.65
    >');
    -0.63
    ]');
    -0.63
    !');
    -0.63
    ');
    
    -0.63
    ,:);
    -0.61
    POSITIVE LOGITS
     ONE
    1.14
    ONE
    1.10
    One
    1.07
     one
    1.06
    one
    1.05
     One
    1.00
     Один
    0.86
    Один
    0.78
     jednego
    0.76
     jednom
    0.76
    Act Density 0.569%

    No Known Activations