INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    yon
    -0.75
    wagen
    -0.71
    haps
    -0.70
    inating
    -0.70
    ivan
    -0.70
    itance
    -0.70
    itary
    -0.69
    rals
    -0.68
    chev
    -0.67
    igmat
    -0.66
    POSITIVE LOGITS
    fast
    1.07
     news
    0.90
     NEWS
    0.87
    AKING
    0.85
     News
    0.85
     Bad
    0.77
    views
    0.77
    NEWS
    0.77
     Breaking
    0.76
    news
    0.76
    Act Density 0.016%

    No Known Activations