INDEX
Explanations
instances of the word "common" in various contexts
New Auto-Interp
Negative Logits
tring
-0.17
LM
-0.16
anager
-0.16
HK
-0.15
dued
-0.14
uations
-0.14
indi
-0.14
/ph
-0.14
gio
-0.13
cer
-0.13
POSITIVE LOGITS
wealth
0.27
rijk
0.18
ridge
0.17
/common
0.16
akra
0.15
ies
0.15
est
0.15
sense
0.14
ality
0.14
itized
0.14
Activations Density 0.022%