INDEX
Explanations
references to the concept of "holism" or related terms
New Auto-Interp
Negative Logits
nx
-0.15
dash
-0.15
icz
-0.15
etto
-0.15
sburg
-0.14
esimal
-0.14
nable
-0.14
овÑĭй
-0.14
Cobra
-0.14
razy
-0.14
POSITIVE LOGITS
hol
0.23
Hol
0.23
ocaust
0.20
istic
0.20
mium
0.20
omorphic
0.19
mega
0.19
Hol
0.18
ocene
0.18
ubar
0.18
Activations Density 0.012%