INDEX
Explanations
phrases related to first impressions
references to initial impressions or superficial observations
New Auto-Interp
Negative Logits
icer
-0.76
tailed
-0.73
ammy
-0.73
rus
-0.70
onement
-0.69
winter
-0.69
rop
-0.66
lez
-0.66
orders
-0.65
Joined
-0.65
POSITIVE LOGITS
blush
0.89
glance
0.89
superf
0.84
IMAGES
0.78
premise
0.71
intuitive
0.67
intuitive
0.66
ILD
0.65
ANE
0.64
understandable
0.64
Activations Density 0.138%