INDEX
Explanations
references to trips or travel-related terms
New Auto-Interp
Negative Logits
</thead>
-0.92
Kelle
-0.73
שוליים
-0.67
Nadel
-0.67
Kell
-0.66
cella
-0.65
Weiss
-0.65
Bracken
-0.65
Judi
-0.63
larıyla
-0.63
POSITIVE LOGITS
trip
1.64
Trips
1.58
trips
1.55
trip
1.48
trips
1.47
Trip
1.46
TRIP
1.40
Trip
1.40
Trips
1.37
TRIP
1.30
Activations Density 0.072%