INDEX
Explanations
years or dates in a specific format
locations and names of museums or theaters
New Auto-Interp
Negative Logits
omething
-0.76
ccording
-0.72
vernment
-0.67
staking
-0.62
ensibly
-0.62
reconc
-0.59
scram
-0.59
sort
-0.58
scrambling
-0.58
compromises
-0.56
POSITIVE LOGITS
********************************
0.71
<|endoftext|>
0.71
aceae
0.67
pmwiki
0.66
0004
0.65
UCHIJ
0.65
NX
0.64
Remix
0.64
4090
0.64
=====
0.63
Activations Density 0.192%