INDEX
Explanations
the word "most" occurring in sentences
references to the majority or common consensus
New Auto-Interp
Negative Logits
pload
-0.77
rompt
-0.74
Handling
-0.67
CARD
-0.67
heid
-0.65
Shepherd
-0.65
Fist
-0.62
icer
-0.62
instead
-0.61
Films
-0.60
POSITIVE LOGITS
importantly
0.99
important
0.90
likely
0.86
afa
0.84
notably
0.76
likely
0.75
notable
0.74
egregious
0.73
interesting
0.71
interesting
0.71
Activations Density 0.084%