INDEX
Explanations
mentions of "most" related to quantitative evaluations
New Auto-Interp
Negative Logits
rompt
-0.85
vest
-0.83
icer
-0.79
heid
-0.79
pload
-0.79
Mellon
-0.76
instead
-0.75
alid
-0.72
adium
-0.70
thur
-0.70
POSITIVE LOGITS
importantly
1.35
afa
0.96
notably
0.96
body
0.93
rar
0.92
important
0.89
likely
0.88
likely
0.86
observers
0.85
egreg
0.83
Activations Density 13.629%