INDEX
    Explanations

    phrases indicating observation or perception

    New Auto-Interp
    Negative Logits
    iga
    -0.15
    isi
    -0.15
    ier
    -0.15
    ï¸ı
    -0.14
    é
    -0.14
    unt
    -0.14
    igers
    -0.14
    uss
    -0.14
    iner
    -0.13
    asc
    -0.13
    POSITIVE LOGITS
     evidence
    0.19
    /he
    0.17
     plenty
    0.16
    kaar
    0.15
     Evidence
    0.15
    /read
    0.15
    ohana
    0.14
    753
    0.14
    OMPI
    0.14
    Unnamed
    0.14
    Act Density 0.116%

    No Known Activations