INDEX
Explanations
mentions of a wide range or variety of options, features, or categories
instances of a specific token indicating the end of a text segment
New Auto-Interp
Negative Logits
illard
-0.78
onis
-0.73
ador
-0.73
ilan
-0.72
ón
-0.69
ICAN
-0.68
nces
-0.67
Dul
-0.63
icks
-0.63
unia
-0.63
POSITIVE LOGITS
swath
1.22
range
1.22
ranging
1.15
variety
1.13
array
1.10
spectrum
1.08
ranging
1.05
spread
1.04
assortment
1.00
scope
0.97
Activations Density 0.032%