INDEX
    Explanations

    identifying as or not as

    New Auto-Interp
    Negative Logits
     некоторых
    0.44
    简直
    0.42
     основных
    0.42
     alguma
    0.41
     alguna
    0.41
     algún
    0.41
     যেসব
    0.41
     algum
    0.41
    なんか
    0.40
     somehow
    0.40
    POSITIVE LOGITS
     ultimately
    0.61
     fundamentally
    0.54
     unequivocally
    0.50
     firstly
    0.50
     categorically
    0.50
     emphatically
    0.49
     uiteindelijk
    0.49
     neither
    0.48
     اولا
    0.46
    Ultimately
    0.46
    Act Density 0.046%

    No Known Activations