INDEX
    Explanations
    New Auto-Interp
    Negative Logits
    in
    -0.10
    int
    -0.09
    IN
    -0.09
     int
    -0.08
    ін
    -0.08
     Sin
    -0.08
    MM
    -0.07
    10
    -0.07
    -end
    -0.07
    6
    -0.07
    POSITIVE LOGITS
     was
    0.21
     were
    0.16
    was
    0.15
     Was
    0.15
     WAS
    0.13
    Was
    0.13
     wasn
    0.13
     Were
    0.12
    were
    0.11
    _was
    0.11
    Act Density 0.359%

    No Known Activations