INDEX
Explanations
the word "most" in various contexts
references to the majority or commonality among groups or systems
New Auto-Interp
Negative Logits
rompt
-0.92
heid
-0.82
instead
-0.70
icer
-0.69
alid
-0.67
moil
-0.67
pload
-0.64
thora
-0.62
vest
-0.62
nton
-0.61
POSITIVE LOGITS
importantly
0.83
mornings
0.73
body
0.72
sane
0.72
observers
0.71
egreg
0.71
superficial
0.70
afa
0.69
ONE
0.69
millenn
0.67
Activations Density 0.049%