INDEX
    Explanations

    phrases expressing similarity or addition

    New Auto-Interp
    Negative Logits
    )";
    
    -0.69
     purpoſe
    -0.69
     ―――――
    -0.66
     faſt
    -0.66
     propOrder
    -0.66
     ſind
    -0.64
     ſtate
    -0.63
     AppColors
    -0.63
    )");
    
    -0.62
    ArrowToggle
    -0.62
    POSITIVE LOGITS
     []:
    0.65
    ,
    0.62
    ########.
    0.59
    since
    0.56
    !
    0.55
     since
    0.55
    .
    0.54
    adays
    0.54
    endphp
    0.53
     (
    0.52
    Act Density 0.343%

    No Known Activations