INDEX
Explanations
references to hotels
references to hotels
New Auto-Interp
Negative Logits
Matter
-0.73
Anarchy
-0.72
alez
-0.72
xit
-0.69
Prompt
-0.68
Izan
-0.68
ktop
-0.67
cale
-0.67
advers
-0.65
dit
-0.65
POSITIVE LOGITS
hotel
1.00
hotels
0.99
Hotel
0.96
accommodations
0.96
guests
0.94
rooms
0.89
resorts
0.89
ibur
0.81
accommodation
0.80
lobb
0.78
Activations Density 0.018%