INDEX
Explanations
words that signify inclusivity or generalization
New Auto-Interp
Negative Logits
çļĦæĺ¯
-0.20
Various
-0.19
Things
-0.18
jin
-0.17
led
-0.16
人们
-0.16
stu
-0.16
each
-0.16
pper
-0.15
rette
-0.15
POSITIVE LOGITS
/all
0.49
kind
0.41
sort
0.37
ones
0.34
place
0.34
kind
0.34
THING
0.32
thin
0.31
ht
0.26
combination
0.25
Activations Density 0.099%