INDEX
Explanations
words related to power and influence
New Auto-Interp
Negative Logits
ãĥ´ãĤ¡
-0.71
pled
-0.70
nant
-0.69
itives
-0.66
xit
-0.66
esters
-0.66
½
-0.62
ificantly
-0.62
ł
-0.61
NEY
-0.61
POSITIVE LOGITS
izabeth
1.24
ibrary
1.22
usive
1.11
iquid
1.07
ixir
1.04
uded
0.91
uxe
0.90
uding
0.86
ipt
0.86
ijah
0.85
Activations Density 0.040%