INDEX
    Explanations

    references to political topics or entities

    New Auto-Interp
    Negative Logits
    -0.57
    i
    -0.50
    Autowired
    -0.49
     p
    -0.47
    p
    -0.46
    g
    -0.44
      
    -0.43
     ra
    -0.43
     l
    -0.43
     van
    -0.42
    POSITIVE LOGITS
     Roskov
    1.18
    SharedCtor
    1.07
     myſelf
    1.06
     ſtate
    1.03
     pleaſure
    1.01
     itſelf
    0.98
     MenuView
    0.98
     fubject
    0.97
    ſelf
    0.97
     juſ
    0.96
    Act Density 0.232%

    No Known Activations