INDEX
    Explanations

    instances of the word "think" in various forms

    New Auto-Interp
    Negative Logits
    ez
    -0.17
    fred
    -0.15
    å§¿
    -0.15
    alent
    -0.15
    iser
    -0.15
    asar
    -0.14
    sher
    -0.14
    aná
    -0.14
    )((((
    -0.14
    /by
    -0.14
    POSITIVE LOGITS
     about
    0.24
     twice
    0.23
     Twice
    0.22
    象
    0.18
    tank
    0.18
    _about
    0.18
    .about
    0.17
    -about
    0.17
     About
    0.16
    cape
    0.16
    Act Density 0.084%

    No Known Activations