INDEX
    Explanations

    phrases related to observations or insights

    phrases that indicate observation or witnessing events

    New Auto-Interp
    Negative Logits
    nor
    -0.77
    ranged
    -0.76
    toe
    -0.74
    orth
    -0.72
    raft
    -0.68
    save
    -0.67
    anium
    -0.66
    ãĥĦ
    -0.66
    phe
    -0.66
    ê
    -0.65
    POSITIVE LOGITS
     examples
    1.16
     parallels
    1.14
     similarities
    1.13
     firsthand
    1.10
     glimps
    1.09
     instances
    1.04
     hints
    1.03
     signs
    1.02
     flashes
    1.00
     how
    0.98
    Act Density 0.142%

    No Known Activations