INDEX
Explanations
instances of the word "working" in various contexts
New Auto-Interp
Negative Logits
antha
-0.76
ylon
-0.71
Augustus
-0.66
emonic
-0.66
Bubble
-0.66
anamo
-0.64
constitu
-0.63
xual
-0.62
angular
-0.61
iren
-0.60
POSITIVE LOGITS
tirelessly
1.13
bench
1.08
hops
1.03
diligently
1.03
collabor
0.99
ethic
0.98
overtime
0.97
heet
0.94
paces
0.86
station
0.85
Activations Density 0.333%