INDEX
Explanations
keywords related to instructions or processes
instances of the word "work" along with its various contextual applications
New Auto-Interp
Negative Logits
ilings
-0.83
antha
-0.80
ylon
-0.79
Flavoring
-0.74
anamo
-0.72
idium
-0.72
ModLoader
-0.70
iren
-0.66
xual
-0.64
fty
-0.64
POSITIVE LOGITS
flows
1.08
bench
1.04
station
1.03
manship
0.94
heet
0.89
hops
0.89
ethic
0.86
river
0.83
horse
0.82
work
0.80
Activations Density 0.080%