INDEX
    Explanations

    phrases that indicate quantity or amount

    New Auto-Interp
    Negative Logits
     pleaſure
    -0.85
     Efq
    -0.79
     ―――――
    -0.74
     itſelf
    -0.72
     Cæsar
    -0.70
     Houſe
    -0.70
     fhort
    -0.70
     raiſ
    -0.70
     NDEBUG
    -0.70
     becauſe
    -0.69
    POSITIVE LOGITS
     of
    1.14
     Of
    0.85
     OF
    0.83
    ReusableCell
    0.82
     المعيارى
    0.81
    ompok
    0.80
    OfClass
    0.80
    Of
    0.80
     của
    0.76
    unked
    0.75
    Act Density 0.147%

    No Known Activations