INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     sincere
    -0.08
    yards
    -0.08
     errs
    -0.08
     lined
    -0.08
    numer
    -0.08
     precisa
    -0.07
     disappoint
    -0.07
     singles
    -0.07
    116
    -0.07
     yards
    -0.07
    POSITIVE LOGITS
    Toggle
    0.17
     переключ
    0.16
     toggle
    0.15
     Toggle
    0.15
     togg
    0.14
    .toggle
    0.14
    _toggle
    0.14
    Switcher
    0.14
    toggle
    0.14
    .switch
    0.13
    Act Density 0.006%

    No Known Activations