INDEX
Explanations
words related to philosophical and ideological concepts
New Auto-Interp
Negative Logits
ses
-0.19
shan
-0.18
inator
-0.18
اء
-0.18
s
-0.18
sel
-0.17
lett
-0.17
flake
-0.17
sed
-0.17
iness
-0.17
POSITIVE LOGITS
apolis
0.29
ity
0.25
ism
0.22
ization
0.20
ized
0.19
alysis
0.19
cy
0.19
zelf
0.19
ismus
0.19
opsis
0.19
Activations Density 0.070%