INDEX
    Explanations

    instances of reference and commentary on perceptions or beliefs

    New Auto-Interp
    Negative Logits
    oton
    -0.16
    olio
    -0.16
    itta
    -0.15
    ofi
    -0.15
    inte
    -0.15
    erb
    -0.15
    uide
    -0.15
    ozo
    -0.14
    itas
    -0.14
    wers
    -0.14
    POSITIVE LOGITS
     instead
    0.38
     merely
    0.36
    Instead
    0.33
     Instead
    0.32
    nor
    0.31
     nor
    0.30
    instead
    0.30
     simply
    0.27
    Nor
    0.27
     Nor
    0.27
    Act Density 0.247%

    No Known Activations