INDEX
Explanations
phrases related to achievements and records
New Auto-Interp
Negative Logits
isan
-0.17
ppo
-0.15
elts
-0.15
ardin
-0.15
stants
-0.15
λÏī
-0.15
pis
-0.14
itorio
-0.14
nement
-0.14
elman
-0.14
POSITIVE LOGITS
-breaking
0.40
breaking
0.38
-setting
0.38
breaking
0.33
setting
0.31
setting
0.29
Breaking
0.29
-high
0.27
Breaking
0.27
Setting
0.26
Activations Density 0.024%