INDEX
    Explanations

    instances where reasons or justifications are being stated

    New Auto-Interp
    Negative Logits
    ształ
    -0.47
     utafitiHapana
    -0.47
     Vil
    -0.44
    arim
    -0.42
     оп
    -0.42
     Am
    -0.42
    -0.42
    -0.41
     An
    -0.39
    ']")
    -0.39
    POSITIVE LOGITS
    verwijspagina
    0.94
    RegistryLite
    0.93
     perché
    0.79
     kasarigan
    0.78
     porque
    0.77
    porque
    0.76
     because
    0.75
    之所以
    0.74
     becauſe
    0.74
     Çünkü
    0.72
    Act Density 0.371%

    No Known Activations