INDEX
Explanations
phrases related to making errors or poor decisions
various types of decisions and mistakes in actions
New Auto-Interp
Negative Logits
iolet
-0.64
adow
-0.64
Topics
-0.63
licks
-0.61
arling
-0.59
awed
-0.58
ongo
-0.58
aples
-0.57
ater
-0.57
redited
-0.56
POSITIVE LOGITS
anew
0.79
liest
0.73
:]
0.70
olicy
0.69
Reloaded
0.64
wisely
0.61
bluntly
0.61
sarcast
0.61
same
0.60
ultimate
0.60
Activations Density 0.214%