INDEX
Explanations
references to "whole" or "entire" entities and concepts
New Auto-Interp
Negative Logits
wholly
-0.15
çµ¶
-0.14
toujours
-0.14
celkem
-0.14
propia
-0.14
.jasper
-0.14
fisse
-0.14
altogether
-0.13
inand
-0.13
pong
-0.13
POSITIVE LOGITS
thing
0.41
spectrum
0.30
heart
0.29
ench
0.29
ties
0.29
thing
0.28
gam
0.27
length
0.26
process
0.26
range
0.26
Activations Density 0.052%