INDEX
Explanations
phrases related to social or political controversies
New Auto-Interp
Negative Logits
DragonMagazine
-0.75
SpaceEngineers
-0.69
BOX
-0.69
Mothers
-0.67
²¾
-0.66
BOOK
-0.65
Wolves
-0.63
Rubin
-0.63
meal
-0.63
xon
-0.62
POSITIVE LOGITS
irming
1.38
luence
1.32
ront
1.28
ixed
1.21
liction
1.17
irms
1.09
irmed
1.07
icion
1.05
ording
1.03
irm
0.96
Activations Density 0.011%