INDEX
Explanations
references to the color red or related concepts
New Auto-Interp
Negative Logits
boru
-0.21
dna
-0.18
dro
-0.16
είο
-0.15
hend
-0.15
teki
-0.15
kok
-0.15
dale
-0.14
ebo
-0.14
Charg
-0.14
POSITIVE LOGITS
ress
0.31
raft
0.30
tape
0.28
flag
0.26
istrict
0.26
ressing
0.25
-flag
0.25
flags
0.24
ocument
0.24
resses
0.24
Activations Density 0.011%