INDEX
Explanations
instances of the word "working" followed by other words
New Auto-Interp
Negative Logits
antha
-0.82
Ukrain
-0.77
ylon
-0.73
Bubble
-0.73
iren
-0.70
wcs
-0.70
etsk
-0.69
emonic
-0.67
Augustus
-0.67
constitu
-0.67
POSITIVE LOGITS
bench
1.23
ethic
1.20
hops
1.07
station
1.06
flows
1.05
aday
1.02
tirelessly
1.01
horse
0.99
forces
0.98
collabor
0.97
Activations Density 1.139%