INDEX
    Explanations

    references to restaurants and dining experiences

    New Auto-Interp
    Negative Logits
    '));
    
    -0.68
     croy
    -0.68
    věr
    -0.67
     évêque
    -0.65
     tremp
    -0.65
    -0.64
    ]);
    
    -0.64
    DropColumn
    -0.64
    %;">
    -0.63
     Jä
    -0.62
    POSITIVE LOGITS
     restaurants
    1.37
     restaurant
    1.34
     Restaurants
    1.23
     Restaurant
    1.19
     RESTAURANT
    1.09
    Restaurants
    1.06
    Restaurant
    1.05
    restaurant
    0.99
    restaurants
    0.98
     restaurante
    0.97
    Act Density 0.050%

    No Known Activations