INDEX
Explanations
phrases related to making decisions or expressing opinions
topics related to censorship and control
New Auto-Interp
Negative Logits
BuyableInstoreAndOnline
-0.59
shown
-0.59
ENGTH
-0.53
Ĥª
-0.52
»Ĵ
-0.48
VERTISEMENT
-0.48
Hoo
-0.47
inguished
-0.46
INFO
-0.46
ãĤ«
-0.46
POSITIVE LOGITS
because
1.25
whilst
1.10
lest
1.05
someday
1.05
whenever
1.05
unless
1.00
whereas
0.99
anymore
0.98
but
0.95
sooner
0.92
Activations Density 0.828%