INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     profiling
    -0.08
     Lincoln
    -0.07
    .profile
    -0.07
     profanity
    -0.07
    pro
    -0.07
     Pro
    -0.07
     Public
    -0.07
     feature
    -0.06
     mock
    -0.06
    Modal
    -0.06
    POSITIVE LOGITS
     kokos
    0.10
     tess
    0.10
    neighbors
    0.10
    _neighbors
    0.10
    _geo
    0.10
     مخروط
    0.10
    _neighbor
    0.09
     cannabinoids
    0.09
     coque
    0.09
    ighbor
    0.09
    Act Density 0.013%

    No Known Activations