INDEX
    Explanations

    phrases that indicate focus or attention towards a subject or topic

    New Auto-Interp
    Negative Logits
    PT
    -0.15
    hest
    -0.15
    ause
    -0.14
    zim
    -0.14
    ÑĢоÑĤив
    -0.14
    nal
    -0.14
    @nate
    -0.14
    AVE
    -0.14
    ãĤīãģı
    -0.14
    colo
    -0.13
    POSITIVE LOGITS
     Tow
    0.16
     toward
    0.15
    à¹Ģà¸Ļ
    0.15
    naÄį
    0.15
     areas
    0.14
     how
    0.14
    627
    0.14
    uzzi
    0.14
    lix
    0.13
    olan
    0.13
    Act Density 0.056%

    No Known Activations