INDEX
Explanations
comparative relationships and positive qualities
evaluative language that reflects opinions about individual worth and capability
New Auto-Interp
Negative Logits
separat
-0.72
occurs
-0.71
VIDEOS
-0.71
Seg
-0.66
excludes
-0.66
disparity
-0.66
Demand
-0.66
ARM
-0.66
VERTISEMENT
-0.64
Conver
-0.64
POSITIVE LOGITS
proud
1.05
happiest
0.99
lucky
0.93
wiser
0.90
happy
0.88
aware
0.86
laughing
0.85
pleased
0.85
interested
0.84
ointed
0.84
Activations Density 0.648%