INDEX
Explanations
instances of the word "working" followed by different context
New Auto-Interp
Negative Logits
antha
-0.79
Ukrain
-0.77
Bubble
-0.72
Flavoring
-0.69
constitu
-0.68
Augustus
-0.67
Ń·
-0.67
anamo
-0.67
ylon
-0.66
emonic
-0.66
POSITIVE LOGITS
bench
1.21
ethic
1.20
aday
1.09
station
1.09
flows
1.08
hops
1.07
horse
1.01
forces
1.00
overtime
0.98
heet
0.97
Activations Density 3.156%