INDEX
Explanations
concepts related to classification and categorization in various contexts
New Auto-Interp
Negative Logits
ys
-0.15
ilar
-0.15
zz
-0.15
urg
-0.15
elf
-0.14
248
-0.14
&E
-0.14
lit
-0.14
idis
-0.14
anders
-0.14
POSITIVE LOGITS
-like
0.44
-style
0.32
like
0.30
-esque
0.29
-type
0.27
LIKE
0.23
ishly
0.22
_like
0.21
style
0.21
-Type
0.19
Activations Density 0.496%