INDEX
Explanations
references to astrology or astrological terms
New Auto-Interp
Negative Logits
leans
-0.19
entanyl
-0.18
erse
-0.17
ráf
-0.16
radient
-0.16
rant
-0.16
otor
-0.16
lems
-0.16
rpc
-0.16
.metro
-0.16
POSITIVE LOGITS
ounding
0.25
ute
0.24
ound
0.24
roph
0.23
ounded
0.23
.literal
0.22
igmat
0.22
utely
0.21
hma
0.21
UTE
0.20
Activations Density 0.007%