INDEX
    Explanations

    phrases indicating a comparison or specification of groups

    New Auto-Interp
    Negative Logits
    <bos>
    -0.62
      
    -0.56
       
    -0.44
    AuthGuard
    -0.43
    ↵↵
    -0.41
    -
    -0.41
    kr
    -0.40
    -0.40
    b
    -0.39
    cing
    -0.38
    POSITIVE LOGITS
     AMONG
    1.24
     Amongst
    1.16
     Among
    1.10
     CreateTagHelper
    1.03
    Among
    1.03
     amongst
    1.01
     Parmi
    1.00
    among
    1.00
     blant
    0.98
    Среди
    0.97
    Act Density 0.100%

    No Known Activations