INDEX
Explanations
words related to different levels of anger or intensity
the word "mad" in various contexts and expressions of emotional intensity
New Auto-Interp
Negative Logits
Lay
-0.67
Ĥİ
-0.61
çĦ
-0.60
ILA
-0.59
Vert
-0.59
advertisement
-0.58
AFB
-0.58
FANT
-0.57
Citation
-0.57
Dayton
-0.57
POSITIVE LOGITS
cap
1.24
agascar
1.17
rid
1.12
onna
1.07
der
1.02
ame
0.93
ras
0.93
rig
0.93
cow
0.90
men
0.89
Activations Density 0.026%