INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     these
    -2.03
    these
    -1.93
    These
    -1.48
     THESE
    -1.47
     These
    -1.41
    これらの
    -1.38
     thefe
    -1.30
     těchto
    -1.24
     theſe
    -1.22
     этих
    -1.20
    POSITIVE LOGITS
     two
    0.75
     same
    0.66
     ideas
    0.65
     three
    0.64
     days
    0.63
     kinds
    0.62
     events
    0.61
     words
    0.60
     questions
    0.60
    0.60
    Act Density 0.113%

    No Known Activations