INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     forbids
    0.69
    Re
    0.67
     그냥
    0.67
     σαν
    0.67
    ohms
    0.67
    ີ່ມ
    0.65
     tux
    0.62
     {:?}",
    0.61
    <=>
    0.61
    ということです
    0.61
    POSITIVE LOGITS
     important
    1.14
     важно
    1.14
     noteworthy
    1.14
     следует
    1.12
     commendable
    1.11
     prudent
    1.06
     importante
    1.02
    important
    0.98
     варто
    0.96
     fortunate
    0.95
    Act Density 0.222%

    No Known Activations