INDEX
Explanations
expressions and phrases related to skepticism and critique of authority or societal norms
New Auto-Interp
Negative Logits
Cousins
-0.15
annex
-0.15
acen
-0.15
ýt
-0.15
"
-0.14
olk
-0.14
bat
-0.14
abol
-0.14
ạn
-0.14
witter
-0.14
POSITIVE LOGITS
Ñģеб
0.17
onium
0.16
uÄį
0.15
onica
0.15
.spy
0.15
nite
0.15
Indented
0.14
alus
0.14
.metamodel
0.14
aus
0.13
Activations Density 0.327%