INDEX
Explanations
sentences containing the word "unique"
New Auto-Interp
Negative Logits
Worse
-0.83
Concern
-0.70
UGH
-0.67
intel
-0.65
shit
-0.64
worn
-0.64
Wr
-0.63
OH
-0.62
idia
-0.62
lest
-0.60
POSITIVE LOGITS
simplicity
1.11
versatility
1.07
flexibility
0.89
inexpensive
0.88
combines
0.87
allows
0.85
streamlined
0.84
avoids
0.83
unlike
0.83
seamlessly
0.80
Activations Density 0.611%