INDEX
Explanations
phrases referring to a majority or a high frequency of something
references to the word "most" indicating commonality or majority
New Auto-Interp
Negative Logits
rompt
-0.91
edin
-0.70
icer
-0.66
thora
-0.66
wings
-0.65
hawk
-0.63
orld
-0.62
moil
-0.62
Report
-0.61
alid
-0.61
POSITIVE LOGITS
importantly
0.97
afa
0.87
mornings
0.82
workplaces
0.78
sane
0.78
observers
0.78
Americans
0.77
cases
0.77
of
0.75
THING
0.75
Activations Density 0.050%