INDEX
Explanations
words related to abstract concepts or art
references to abstract concepts or representations
New Auto-Interp
Negative Logits
odder
-0.75
UNCH
-0.72
kins
-0.69
attering
-0.67
risome
-0.67
cture
-0.67
owship
-0.67
rimp
-0.67
ggle
-0.66
ICAN
-0.66
POSITIVE LOGITS
ions
1.21
matter
0.96
edly
0.93
edIn
0.86
stract
0.84
Matter
0.84
ured
0.83
urally
0.82
painter
0.80
syntax
0.78
Activations Density 0.030%