INDEX
    Explanations

    conjunctions and phrases indicative of conditions or consequences

    New Auto-Interp
    Negative Logits
    <bos>
    -0.76
    שְׁ
    -0.66
    brainly
    -0.65
     faſt
    -0.65
     Diony
    -0.63
     setId
    -0.62
     unauthorised
    -0.61
    -0.61
     ſur
    -0.61
    aryen
    -0.61
    POSITIVE LOGITS
    ,
    1.02
    .,
    0.83
     but
    0.83
     however
    0.78
    %,
    0.78
    (),
    0.77
    ViewFeatures
    0.77
    ′,
    0.77
    ,-,
    0.77
    *,
    0.76
    Act Density 2.379%

    No Known Activations