INDEX
    Explanations

    phrases indicating responsibility or actions of specific individuals or groups

    phrases that indicate accountability or responsibility

    New Auto-Interp
    Negative Logits
    aukee
    -0.73
    rique
    -0.70
    ilial
    -0.70
    ricane
    -0.70
    anus
    -0.69
    onut
    -0.67
    poon
    -0.67
    ruary
    -0.65
    avorite
    -0.64
     basil
    -0.64
    POSITIVE LOGITS
    ...]
    0.72
    offs
    0.62
    ainer
    0.61
     theoret
    0.60
    erous
    0.59
    aders
    0.57
    urers
    0.57
    uary
    0.57
     attm
    0.57
    ography
    0.57
    Act Density 0.015%

    No Known Activations