INDEX
Explanations
phrases where one thing is more significant or powerful than another
concepts related to comparison and dominance in terms of quantity or significance
New Auto-Interp
Negative Logits
lear
-0.73
cius
-0.65
BRA
-0.60
azel
-0.59
Renew
-0.59
alien
-0.59
oker
-0.58
cradle
-0.58
pring
-0.58
Directors
-0.58
POSITIVE LOGITS
ighed
1.29
outweigh
0.97
ĺħ
0.89
¿½
0.86
outwe
0.85
00200000
0.77
onent
0.74
200000
0.74
ĸļ
0.73
swer
0.71
Activations Density 0.019%