INDEX
Explanations
expressions of strong positive sentiment
expressions of strong affection or enjoyment
New Auto-Interp
Negative Logits
pter
-0.69
regretted
-0.69
suffice
-0.68
laim
-0.67
reluct
-0.66
authorized
-0.66
uthor
-0.64
ailable
-0.64
dilig
-0.63
ident
-0.61
POSITIVE LOGITS
gravy
0.66
idea
0.66
animous
0.64
housing
0.60
decoration
0.59
NCT
0.58
Flip
0.58
simplicity
0.57
symmetry
0.57
guts
0.56
Activations Density 0.174%