INDEX
Explanations
expressing disapproval or indictment
New Auto-Interp
Negative Logits
很重要
0.53
=)
0.46
Importance
0.45
畊
0.44
Weather
0.43
😊
0.43
:)
0.43
중요
0.43
ఉండే
0.42
ভাই
0.41
POSITIVE LOGITS
hypocrisy
0.82
shameful
0.78
disgraceful
0.77
shame
0.73
shameless
0.70
Shame
0.69
absurdity
0.69
disgrace
0.68
pathetic
0.68
frankly
0.68
Activations Density 0.010%