INDEX
Explanations
phrases indicating evaluation, such as "bad" or "difficult"
phrases expressing difficulty or negative evaluations
New Auto-Interp
Negative Logits
ãĤ½
-0.75
urd
-0.71
Offline
-0.68
gypt
-0.67
yth
-0.67
ystem
-0.65
last
-0.64
tv
-0.64
iHUD
-0.63
wake
-0.63
POSITIVE LOGITS
agher
0.73
nels
0.66
Carbuncle
0.66
elbows
0.65
harass
0.64
slicing
0.63
cruising
0.62
temptation
0.62
flavors
0.59
elbow
0.58
Activations Density 0.127%