INDEX
    Explanations

    cannot provide illegal/harmful information

    New Auto-Interp
    Negative Logits
     
    0.77
     some
    0.69
     the
    0.65
    ,
    0.64
    :
    0.63
     (
    0.61
    .
    0.61
     polych
    0.58
     three
    0.57
     four
    0.57
    POSITIVE LOGITS
    would
    0.98
     Would
    0.87
    Would
    0.87
     WOULD
    0.80
     serait
    0.76
     impedir
    0.75
     avrebbe
    0.75
    latego
    0.74
     would
    0.73
    任何人
    0.73
    Act Density 0.001%

    No Known Activations