INDEX
    Explanations

    negative numerical values or indicators

    New Auto-Interp
    Negative Logits
    .
    -0.81
     in
    -0.56
     deter
    -0.54
    ,
    -0.54
    ?
    -0.52
    \}.
    -0.51
     is
    -0.51
    ],
    
    -0.50
     zel
    -0.50
    utnant
    -0.49
    POSITIVE LOGITS
     fevere
    0.97
     %-
    0.92
     occaf
    0.91
    AxisAlignment
    0.90
     ainfi
    0.88
    IntoConstraints
    0.86
     ſever
    0.85
     Савезне
    0.84
     Monfieur
    0.82
     feroit
    0.80
    Act Density 0.318%

    No Known Activations