INDEX
Explanations
words related to functionality or operation
instances of the word "work" in various contexts
New Auto-Interp
Negative Logits
ilings
-0.77
rition
-0.74
sbm
-0.72
Flavoring
-0.69
ildo
-0.68
gart
-0.67
anamo
-0.64
aez
-0.64
Gamble
-0.63
ewitness
-0.62
POSITIVE LOGITS
flows
1.11
flaw
1.06
heet
1.02
correctly
1.01
seamlessly
1.00
reliably
1.00
properly
0.97
paces
0.94
smoothly
0.93
offline
0.93
Activations Density 0.074%