INDEX
    Explanations

    text following quotations

    New Auto-Interp
    Negative Logits
    ,
    0.55
     logical
    0.53
     linear
    0.52
     caloric
    0.51
     calculation
    0.50
     transparent
    0.50
     calculations
    0.50
     neighborhood
    0.50
     subsequent
    0.50
     symbolic
    0.48
    POSITIVE LOGITS
    You
    0.99
    We
    0.98
    My
    0.98
    Our
    0.96
    Б
    0.96
    Texto
    0.93
    Dear
    0.92
    Wonderful
    0.92
    Ю
    0.89
    Мы
    0.88
    Act Density 0.560%

    No Known Activations