INDEX
Explanations
references to large, disruptive parties
New Auto-Interp
Negative Logits
remar
-0.18
lunch
-0.18
lunches
-0.16
Lunch
-0.16
tea
-0.16
pta
-0.16
ffee
-0.15
imers
-0.15
coffee
-0.14
andin
-0.14
POSITIVE LOGITS
party
0.39
PARTY
0.30
party
0.30
-party
0.29
Party
0.29
Party
0.28
_party
0.28
parties
0.28
.party
0.22
bac
0.22
Activations Density 0.117%