INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     overfitting
    1.07
     concentric
    1.06
     Translator
    1.03
     arousal
    1.01
    <unused742>
    1.01
     clipping
    1.00
     क्रिप्टोकर
    0.96
     translator
    0.95
    <unused687>
    0.95
    <unused428>
    0.95
    POSITIVE LOGITS
     britannique
    0.70
    中国
    0.65
     accéder
    0.63
     في
    0.63
     británico
    0.62
    y
    0.61
     Gobierno
    0.61
    [
    0.59
     violência
    0.58
    ain
    0.58
    Act Density 0.024%

    No Known Activations