INDEX
Explanations
phrases expressing personal feelings or opinions
expressions of personal feelings or comparisons
New Auto-Interp
Negative Logits
ircraft
-0.86
hiba
-0.83
abases
-0.83
alt
-0.81
ouched
-0.76
alez
-0.75
ells
-0.71
arry
-0.70
ourse
-0.70
arling
-0.70
POSITIVE LOGITS
lier
0.86
liest
0.73
parity
0.73
calling
0.71
crap
0.70
picking
0.67
slipping
0.66
spitting
0.66
lihood
0.65
¥µ
0.64
Activations Density 0.025%