INDEX
Explanations
mentions of freedom or liberation concepts
New Auto-Interp
Negative Logits
Vita
-0.17
Gap
-0.16
Brotherhood
-0.15
leta
-0.15
Gap
-0.15
gap
-0.14
beaten
-0.14
azzi
-0.14
Ha
-0.14
ika
-0.14
POSITIVE LOGITS
irty
0.16
artz
0.16
opoulos
0.15
zel
0.15
note
0.15
pmat
0.15
ZO
0.14
.Mouse
0.14
uben
0.14
ãģĻãģĻ
0.14
Activations Density 0.012%