INDEX
Explanations
phrases indicating emphasis or importance
mentions of "one" in various contexts
New Auto-Interp
Negative Logits
heny
-0.72
osponsors
-0.70
xton
-0.67
hips
-0.66
inders
-0.62
ories
-0.62
owed
-0.61
orically
-0.61
atted
-0.60
ivated
-0.59
POSITIVE LOGITS
esan
0.89
Hundred
0.82
hundred
0.74
sided
0.74
stice
0.69
handed
0.68
dimensional
0.68
hots
0.67
uscript
0.65
alian
0.65
Activations Density 0.055%