INDEX
Explanations
phrases or words related to specificity or uniqueness
New Auto-Interp
Negative Logits
ãĥ¼ãĤ¯
-0.75
udi
-0.72
clips
-0.72
runners
-0.70
OIL
-0.70
âĨij
-0.70
masters
-0.68
docs
-0.68
flex
-0.68
IVERS
-0.68
POSITIVE LOGITS
piece
1.15
amount
1.14
destination
1.09
location
1.08
person
1.06
target
1.04
direction
1.00
number
0.99
outcome
0.99
subset
0.96
Activations Density 0.102%