INDEX
    Explanations

    phrases that indicate positive experiences or evaluations

    New Auto-Interp
    Negative Logits
     peligros
    -0.53
     saveiro
    -0.52
     peligro
    -0.50
    styleUrls
    -0.50
    ugc
    -0.49
     skär
    -0.49
     cementerio
    -0.48
    OutputType
    -0.47
     riscos
    -0.47
     goles
    -0.47
    POSITIVE LOGITS
     been
    0.68
    Been
    0.58
     Been
    0.56
    been
    0.54
    HasBeen
    0.53
     helpful
    0.48
     throughout
    0.45
     NUKAT
    0.45
    一直
    0.45
     taken
    0.45
    Act Density 0.012%

    No Known Activations