INDEX
Explanations
words related to promoting or promoting actions
New Auto-Interp
Negative Logits
*/(
-0.82
psons
-0.80
ãĤµãĥ¼ãĥĨãĤ£ãĥ¯ãĥ³
-0.76
ANG
-0.75
assian
-0.75
displayText
-0.74
partName
-0.71
Detected
-0.71
fuck
-0.70
ãĤ«
-0.68
POSITIVE LOGITS
abstinence
0.96
atheism
0.91
separat
0.90
boycot
0.83
virtues
0.81
tolerance
0.80
wellness
0.80
unity
0.80
patriotism
0.80
authenticity
0.79
Activations Density 0.082%