INDEX
Explanations
references to religious figures and political movements
New Auto-Interp
Negative Logits
agara
-0.16
.cg
-0.15
Technique
-0.14
ilst
-0.14
fdc
-0.14
_plate
-0.14
ãģĹãĤĩ
-0.13
Resolve
-0.13
ÏģοÏį
-0.13
Lanka
-0.13
POSITIVE LOGITS
Sal
0.32
cler
0.29
Wah
0.26
Sun
0.25
Cler
0.24
ultra
0.24
Sal
0.24
Sun
0.23
wah
0.21
fundamental
0.21
Activations Density 0.052%