INDEX
Explanations
phrases containing the word "ings" with higher activations, potentially related to technical discussions or instructions
New Auto-Interp
Negative Logits
earthqu
-1.09
SIGN
-1.09
ãĥ¯ãĥ³
-1.02
Äĩ
-1.00
vier
-0.98
Effective
-0.96
Young
-0.94
Ub
-0.93
isons
-0.93
Durham
-0.93
POSITIVE LOGITS
hots
1.57
omething
1.51
tons
1.49
hot
1.46
poons
1.45
poon
1.41
peed
1.35
ystem
1.34
pace
1.31
layer
1.30
Activations Density 1.607%