INDEX
    Explanations

    is/be/become/actually classification

    New Auto-Interp
    Negative Logits
     auxquels
    0.37
     auxquelles
    0.35
    0.35
     відповідно
    0.34
    гене
    0.34
    ध्यान
    0.33
     cuyas
    0.33
    会出现
    0.32
    衛生
    0.32
     бажа
    0.32
    POSITIVE LOGITS
     classified
    0.54
     categorized
    0.49
     classed
    0.49
     явля
    0.46
     part
    0.46
    classified
    0.45
     actually
    0.45
     являются
    0.44
     classifies
    0.44
     called
    0.43
    Act Density 0.072%

    No Known Activations