INDEX
    Explanations

    empires and societies

    New Auto-Interp
    Negative Logits
     auffi
    -1.07
     Theſe
    -1.07
     ſta
    -1.06
     pleaſure
    -1.05
     houſe
    -1.02
     purpoſe
    -1.00
    ſelves
    -0.98
     Diſ
    -0.98
     Reſ
    -0.98
    Autoritní
    -0.96
    POSITIVE LOGITS
    0.72
     in
    0.70
    0.62
     (
    0.58
     [
    0.58
     ma
    0.57
     Re
    0.57
     of
    0.57
     In
    0.54
     m
    0.53
    Act Density 0.017%

    No Known Activations