INDEX
    Explanations

    phrases indicating something is not socially or morally appropriate

    words related to acceptability or unacceptability

    New Auto-Interp
    Negative Logits
    ynthesis
    -0.83
    ocket
    -0.80
    frey
    -0.76
    planes
    -0.75
    ilant
    -0.73
    lets
    -0.72
    dream
    -0.72
     helic
    -0.69
    berry
    -0.69
    wright
    -0.68
    POSITIVE LOGITS
    GoldMagikarp
    0.80
     srfAttach
    0.75
     CPC
    0.75
    lihood
    0.73
     deviations
    0.72
    ible
    0.71
    âĶĢâĶĢ
    0.70
     precedent
    0.69
    itable
    0.69
    Gi
    0.68
    Act Density 0.029%

    No Known Activations