INDEX
Explanations
high activation values across various sections of structured data
New Auto-Interp
Negative Logits
Бли
-0.66
control
-0.64
aktery
-0.63
Tok
-0.62
-0.60
Bue
-0.60
hela
-0.59
ので
-0.59
Towel
-0.58
bule
-0.58
POSITIVE LOGITS
9
2.02
NINE
1.43
Ninth
1.42
ninth
1.32
Nine
1.31
nine
1.29
۹
1.29
ninety
1.28
ninth
1.25
nine
1.23
Activations Density 0.709%