INDEX
    Explanations

    instances where permission or prohibition is discussed

    instances of the word "allow" and its variations

    New Auto-Interp
    Negative Logits
    enegger
    -0.68
     Soldier
    -0.68
    nard
    -0.66
    star
    -0.65
    borough
    -0.64
    bard
    -0.64
    athan
    -0.63
    kind
    -0.62
    figure
    -0.62
    kaya
    -0.61
    POSITIVE LOGITS
    Reviewer
    0.91
     us
    0.71
    ipient
    0.71
    opol
    0.71
    auga
    0.70
     exemptions
    0.69
     rapists
    0.67
     disclaim
    0.66
    Ĭ±
    0.65
     exceptions
    0.65
    Act Density 0.043%

    No Known Activations