INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     console
    -0.08
     qualifier
    -0.08
     has
    -0.07
     spec
    -0.07
     specif
    -0.07
     teht
    -0.07
     assistant
    -0.07
     features
    -0.07
     ranking
    -0.07
     description
    -0.07
    POSITIVE LOGITS
     victims
    0.09
    侵犯
    0.09
     celebrities
    0.09
     Brennan
    0.09
     pornography
    0.09
     Arbitration
    0.09
     offend
    0.08
     blancos
    0.08
    reef
    0.08
    0.08
    Act Density 0.004%

    No Known Activations