INDEX
    Explanations

    instances of admiration and appreciation

    New Auto-Interp
    Negative Logits
    odoxy
    -0.15
    ologic
    -0.14
    ender
    -0.14
    steam
    -0.14
    主義
    -0.14
    western
    -0.14
    /Dk
    -0.13
    ways
    -0.13
    ivate
    -0.13
    Ãłi
    -0.13
    POSITIVE LOGITS
    egas
    0.16
    _CTL
    0.15
    738
    0.15
     ble
    0.15
    acle
    0.15
    ué
    0.14
    thora
    0.14
    ideographic
    0.14
    iors
    0.14
    ATA
    0.14
    Act Density 0.008%

    No Known Activations