INDEX
Explanations
sentiments of love and appreciation towards a product or experience
love and hate expressions
New Auto-Interp
Negative Logits
שוליים
-0.71
AutoScaleMode
-0.65
ſche
-0.64
ligiloj
-0.62
NameInMap
-0.60
ſta
-0.60
好文分享
-0.59
パンチラ
-0.58
بوابة
-0.57
<unused47>
-0.57
POSITIVE LOGITS
enamor
0.44
hate
0.40
love
0.40
love
0.39
LOVE
0.38
Love
0.38
Love
0.36
hate
0.35
loves
0.35
loved
0.34
Activations Density 0.027%