INDEX
Explanations
proper nouns related to people or places
repeated mentions of the word "work" in various forms
New Auto-Interp
Negative Logits
ylon
-0.81
constitu
-0.70
idium
-0.70
Ukrain
-0.67
emonic
-0.64
Flavoring
-0.62
iren
-0.61
COMPLE
-0.60
fal
-0.60
Shal
-0.59
POSITIVE LOGITS
ethic
1.13
hops
1.04
flows
0.96
collabor
0.94
arrang
0.93
station
0.91
mates
0.89
bench
0.89
hirt
0.87
tirelessly
0.87
Activations Density 0.079%