INDEX
    Explanations

    describes states or processes

    New Auto-Interp
    Negative Logits
    as
    0.55
    inter
    0.45
     rules
    0.44
     essay
    0.44
     eve
    0.44
    0.43
    vy
    0.43
     bet
    0.43
    MBC
    0.43
     estaba
    0.43
    POSITIVE LOGITS
    abhavena
    0.55
     цели
    0.47
    𒃶
    0.46
     зве
    0.46
    žite
    0.46
     молодых
    0.46
     possano
    0.45
     специалист
    0.45
    écution
    0.44
     ગાં
    0.44
    Act Density 0.002%

    No Known Activations