INDEX
Explanations
negative phrases or expressions that indicate failure or decline
New Auto-Interp
Negative Logits
Cyrus
-0.77
fireworks
-0.77
Mub
-0.72
Pengu
-0.72
ado
-0.72
ysis
-0.71
FML
-0.71
Thrones
-0.69
Tsukuyomi
-0.69
Clash
-0.69
POSITIVE LOGITS
based
1.29
forward
1.26
burning
1.23
backed
1.14
roots
1.12
centered
1.10
centric
1.07
covered
1.07
related
1.05
bound
1.05
Activations Density 0.037%