INDEX
    Explanations
    New Auto-Interp
    Negative Logits
     Tat
    -0.08
    ніст
    -0.07
    -$
    -0.07
    -san
    -0.06
     Sutton
    -0.06
     earnings
    -0.06
     mounts
    -0.06
     hayvan
    -0.06
     territor
    -0.06
    Production
    -0.06
    POSITIVE LOGITS
    Observer
    0.13
     observer
    0.12
    observer
    0.11
     Observer
    0.10
     observers
    0.08
     OB
    0.08
     addObserver
    0.07
     Broker
    0.07
     Ob
    0.07
    _OCC
    0.07
    Act Density 0.002%

    No Known Activations