INDEX
Explanations
words related to successful or unsuccessful outcomes or results
repeated mentions of the word "work" or its variants, indicating discussions about workability or effectiveness of strategies
New Auto-Interp
Negative Logits
anamo
-0.85
antha
-0.81
ailable
-0.74
sbm
-0.66
76561
-0.64
pora
-0.64
ople
-0.63
xual
-0.63
Browse
-0.62
gart
-0.62
POSITIVE LOGITS
wonders
1.32
miracles
1.18
heet
1.09
harder
0.96
overtime
0.93
tirelessly
0.92
flaw
0.92
bench
0.92
brilliantly
0.91
against
0.91
Activations Density 0.051%