INDEX
    Explanations

    phrases indicating knowledge or awareness

    references to knowledge or awareness

    New Auto-Interp
    Negative Logits
    inqu
    -0.79
    phant
    -0.74
    ãĤ´ãĥ³
    -0.74
    vik
    -0.73
    onding
    -0.65
    cohol
    -0.64
    verse
    -0.64
     sidx
    -0.64
    viks
    -0.64
    reau
    -0.64
    POSITIVE LOGITS
     firsthand
    1.25
     how
    1.19
     instinctively
    1.12
     exactly
    1.11
     better
    1.10
     what
    0.97
     best
    0.96
     intimately
    0.96
     perfectly
    0.91
    how
    0.89
    Act Density 0.088%

    No Known Activations