INDEX
Explanations
complimentary phrases or words
New Auto-Interp
Negative Logits
ogens
-0.81
NF
-0.72
orig
-0.71
redit
-0.71
Administ
-0.70
abad
-0.70
igate
-0.70
onics
-0.70
abies
-0.70
eligible
-0.69
POSITIVE LOGITS
little
0.89
breeze
0.84
nice
0.82
fluffy
0.80
neat
0.79
bye
0.79
bye
0.79
gesture
0.79
ño
0.77
sounding
0.77
Activations Density 0.041%