INDEX
Explanations
terms related to religious and individual freedoms
New Auto-Interp
Negative Logits
ÄįÃŃ
-0.16
è¸ı
-0.15
figcaption
-0.15
.sync
-0.14
nell
-0.14
icrosoft
-0.14
Ø·Ùĩ
-0.14
ounder
-0.14
.cmb
-0.14
ustum
-0.13
POSITIVE LOGITS
expression
0.38
speech
0.30
Expression
0.29
expression
0.29
assembly
0.29
religion
0.26
conscience
0.26
thought
0.25
Expression
0.24
-expression
0.24
Activations Density 0.025%