INDEX
Explanations
phrases indicating urgency or immediacy
New Auto-Interp
Negative Logits
azio
-0.15
wang
-0.14
ataka
-0.14
Pointer
-0.14
Bars
-0.13
243
-0.13
erais
-0.13
Brands
-0.13
sare
-0.13
ith
-0.13
POSITIVE LOGITS
åĪĹ
0.17
ller
0.16
ousel
0.15
URA
0.15
że
0.15
çµ¶
0.14
liest
0.14
ATEGORY
0.14
iples
0.14
Forbidden
0.14
Activations Density 0.005%