INDEX
Explanations
awe, amazement, fascination, admiration
New Auto-Interp
Negative Logits
sukh
0.77
<!--<
0.72
unpleasant
0.67
undesirable
0.64
nonatomic
0.63
🙅
0.62
ńskiej
0.62
Redox
0.62
suicide
0.62
ໂ
0.61
POSITIVE LOGITS
awe
2.46
amazement
2.07
admiration
1.99
wonder
1.94
amazed
1.89
marvel
1.88
aw
1.84
fascination
1.76
fascinated
1.72
astonishment
1.65
Activations Density 0.222%