INDEX
Explanations
phrases related to the concept of something working or not working
instances of the word "work" and its variations in different contexts
New Auto-Interp
Negative Logits
pora
-0.76
ilings
-0.73
anamo
-0.71
antha
-0.71
gow
-0.65
ensor
-0.64
gart
-0.64
olic
-0.64
agin
-0.64
xual
-0.62
POSITIVE LOGITS
heet
1.09
bench
0.99
hops
0.93
overtime
0.90
miracles
0.88
seamlessly
0.86
wonders
0.84
smoothly
0.83
flaw
0.82
differently
0.82
Activations Density 0.059%