INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    line
    -1.09
    lines
    -0.96
    less
    -0.83
     the
    -0.80
    like
    -0.79
    i
    -0.79
    if
    -0.78
    the
    -0.78
    a
    -0.75
    no
    -0.74
    POSITIVE LOGITS
    IntoConstraints
    1.00
    )");
    
    0.98
    '}>
    0.97
    ]));
    
    0.95
    ")));
    
    0.93
    NUMX
    0.93
    )"),
    0.93
    '));
    
    0.91
     stället
    0.91
    '},
    
    0.90
    Act Density 0.853%

    No Known Activations