INDEX
Explanations
adjectives and nouns related to admiration, fascination, and disgust
emotional responses of admiration, affection, and enthusiasm towards various subjects
New Auto-Interp
Negative Logits
reorgan
-0.65
mans
-0.64
redund
-0.64
helicop
-0.63
ModLoader
-0.61
antioxid
-0.60
ramid
-0.60
ewater
-0.60
umm
-0.60
peanuts
-0.60
POSITIVE LOGITS
acy
0.88
ability
0.87
fulness
0.86
rence
0.79
lessness
0.78
worthiness
0.77
quot
0.76
toward
0.75
wart
0.75
ately
0.74
Activations Density 0.095%