INDEX
    Explanations

    references to Roman cultural elements and terms

    New Auto-Interp
    Negative Logits
     Efq
    -0.95
     JoJo
    -0.94
     Monfieur
    -0.90
     Gerd
    -0.89
     CreateTagHelper
    -0.88
     Volga
    -0.87
     pleaſure
    -0.87
     Psyche
    -0.86
     Keim
    -0.86
    otheses
    -0.84
    POSITIVE LOGITS
     rom
    0.86
     Rom
    0.82
    rom
    0.79
    Rom
    0.79
    ROM
    0.78
    anz
    0.77
     Romero
    0.73
    𝟱
    0.73
     ROM
    0.70
     Roman
    0.68
    Act Density 0.525%

    No Known Activations