INDEX
Explanations
instances where something is applicable or relevant to a majority
references to the word "most" highlighting its significance or prevalence in various contexts
New Auto-Interp
Negative Logits
rompt
-0.86
heid
-0.79
icer
-0.75
pload
-0.73
alid
-0.66
Films
-0.64
orld
-0.64
instead
-0.63
vest
-0.62
thora
-0.62
POSITIVE LOGITS
importantly
0.84
ONE
0.79
mornings
0.79
important
0.73
superficial
0.73
mundane
0.71
likely
0.71
observers
0.70
ones
0.70
body
0.70
Activations Density 0.057%