INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    Match
    -0.08
    Nhan
    -0.08
    	match
    -0.08
    .codes
    -0.08
    Supervisor
    -0.08
    .Formatting
    -0.08
     match
    -0.08
     maestro
    -0.08
     volte
    -0.08
     Match
    -0.08
    POSITIVE LOGITS
     convex
    0.15
     Hess
    0.10
    0.09
     curve
    0.09
     monot
    0.09
     discour
    0.09
    åt
    0.08
     induces
    0.08
     Conv
    0.08
     curvas
    0.08
    Act Density 0.017%

    No Known Activations