INDEX
    Explanations

    instances of the word "think" or phrases expressing opinions and thoughts

    New Auto-Interp
    Negative Logits
    ez
    -0.17
    enis
    -0.15
    ardo
    -0.15
    жен
    -0.15
    MBER
    -0.14
    飯
    -0.14
    ptal
    -0.14
    /wiki
    -0.13
    caff
    -0.13
    itol
    -0.13
    POSITIVE LOGITS
    atti
    0.18
    arris
    0.15
    able
    0.15
    باش
    0.15
    rolls
    0.15
    ching
    0.14
    chia
    0.14
     it
    0.14
    tank
    0.14
    enton
    0.14
    Act Density 0.037%

    No Known Activations