INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     be
    -0.26
     Be
    -0.22
    be
    -0.20
     бÑĥдÑĮ
    -0.18
    (be
    -0.18
     guilty
    -0.17
    Be
    -0.17
    lec
    -0.16
     STILL
    -0.16
     بتÙĪØ§ÙĨ
    -0.15
    POSITIVE LOGITS
     originally
    0.25
    actic
    0.23
     previously
    0.19
     formerly
    0.19
     invent
    0.18
     initially
    0.18
     Originally
    0.17
     happen
    0.17
     fare
    0.17
    antha
    0.16
    Act Density 0.066%

    No Known Activations