INDEX
Explanations
phrases related to size and proportion
the word "the"
New Auto-Interp
Negative Logits
ulhu
-0.80
obo
-0.78
TAIN
-0.76
imi
-0.76
arettes
-0.75
arate
-0.74
ornia
-0.74
arine
-0.72
aunder
-0.71
udo
-0.71
POSITIVE LOGITS
biggest
1.20
sheer
1.18
vast
1.17
absence
1.12
reality
1.12
specifics
1.11
majority
1.11
actual
1.10
underlying
1.09
presence
1.08
Activations Density 0.267%