INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    s
    0.63
     
    0.55
    date
    0.54
    xiety
    0.52
     bookmarks
    0.52
    name
    0.52
    ll
    0.51
    represents
    0.50
    b
    0.50
    ed
    0.50
    POSITIVE LOGITS
     admire
    0.85
     praised
    0.83
     praising
    0.82
     praise
    0.78
     admired
    0.76
     admiring
    0.74
     admires
    0.74
     admiration
    0.73
     khen
    0.71
     lauded
    0.70
    Act Density 0.041%

    No Known Activations