INDEX
Explanations
references to freedom and its different aspects, particularly relating to religion and expression
New Auto-Interp
Negative Logits
Kültür
-0.15
ãĥ³ãĤ°
-0.14
aska
-0.14
ØŃاضر
-0.14
λικ
-0.14
BOOK
-0.14
urban
-0.14
ULD
-0.14
cha
-0.14
iled
-0.14
POSITIVE LOGITS
fighters
0.28
Fighters
0.28
/lib
0.26
fighter
0.24
fighters
0.23
Fighter
0.22
loving
0.22
-loving
0.21
zes
0.20
fighter
0.19
Activations Density 0.025%