INDEX
    Explanations

    phrases indicating the presence of specific events or actions

    New Auto-Interp
    Negative Logits
    -0.58
     propOrder
    -0.56
    مصادر
    -0.54
     EconPapers
    -0.54
    ertor
    -0.50
    awtextra
    -0.49
     pleaſure
    -0.49
    codiles
    -0.49
    ſelf
    -0.48
     trast
    -0.47
    POSITIVE LOGITS
     schonmal
    0.52
    gnancy
    0.50
     possu
    0.49
     riguardo
    0.47
     Anyways
    0.45
     urma
    0.43
     πως
    0.43
    Anyways
    0.41
     kì
    0.41
     anyways
    0.41
    Act Density 0.124%

    No Known Activations