INDEX
Explanations
negative descriptions or critiques
words related to vilification and criticism
New Auto-Interp
Negative Logits
cropped
-0.67
hr
-0.65
newsp
-0.61
-0.60
hearty
-0.60
Self
-0.60
Sevent
-0.60
Publisher
-0.59
Requ
-0.59
Ģ
-0.58
POSITIVE LOGITS
vil
1.06
icious
0.86
chio
0.84
ibrary
0.81
ionage
0.80
theless
0.79
zx
0.79
Nadu
0.78
destro
0.78
arge
0.77
Activations Density 0.006%