INDEX
    Explanations

    words related to discovery and exploration

    New Auto-Interp
    Negative Logits
    statt
    -0.18
    soever
    -0.18
    unn
    -0.17
    rott
    -0.16
     Aviv
    -0.15
    hee
    -0.15
    quired
    -0.15
    uracy
    -0.15
    /she
    -0.15
    igi
    -0.15
    POSITIVE LOGITS
    ies
    0.24
    ry
    0.23
    IES
    0.21
    ability
    0.20
    verse
    0.19
    ogue
    0.19
    ries
    0.18
    ments
    0.17
    ment
    0.17
    ively
    0.16
    Act Density 0.023%

    No Known Activations