INDEX
Explanations
specific mentions of prepositions followed by certain nouns
New Auto-Interp
Negative Logits
ν
-0.65
̶
-0.62
antics
-0.61
wig
-0.61
amus
-0.60
ruciating
-0.60
ÏĤ
-0.59
wrench
-0.57
vor
-0.57
inos
-0.57
POSITIVE LOGITS
screen
1.19
behalf
1.10
erous
1.04
site
1.02
billboards
0.99
ibaba
0.93
Pastebin
0.92
sets
0.90
shore
0.89
eBay
0.89
Activations Density 0.145%