INDEX
    Explanations

    phrases that indicate relationships, particularly in the context of implementation and effect

    New Auto-Interp
    Negative Logits
     Cæsar
    -0.82
     Athenians
    -0.72
     Huguen
    -0.71
     Hadrian
    -0.70
     Mahomet
    -0.69
     Hopf
    -0.68
     Majefty
    -0.66
     Assyrian
    -0.65
     Phry
    -0.65
     Thebes
    -0.64
    POSITIVE LOGITS
     the
    1.58
     a
    1.10
     these
    1.05
    )";
    
    1.05
     their
    1.05
     our
    1.03
    .}(
    1.03
     both
    1.01
    "):
    
    1.00
     an
    0.96
    Act Density 8.062%

    No Known Activations