INDEX
Explanations
words related to decisions being made
occurrences of the word "made"
New Auto-Interp
Negative Logits
sit
-0.69
ding
-0.66
decaying
-0.66
rotting
-0.61
stripes
-0.60
peppers
-0.58
bos
-0.58
wit
-0.58
grou
-0.57
striped
-0.57
POSITIVE LOGITS
emort
0.94
sure
0.86
ensibly
0.82
gements
0.79
itives
0.78
urate
0.78
ezvous
0.77
awaru
0.76
oji
0.76
vious
0.76
Activations Density 0.045%