INDEX
Explanations
references to time, events, or specific historical contexts
New Auto-Interp
Negative Logits
ozem
-0.17
ncy
-0.16
orta
-0.16
skoro
-0.16
ufs
-0.15
reserve
-0.15
alette
-0.15
dostan
-0.14
empor
-0.14
orks
-0.14
POSITIVE LOGITS
countdown
0.18
suo
0.18
los
0.17
Countdown
0.16
mismo
0.16
uso
0.16
same
0.15
sole
0.15
lo
0.15
ibo
0.15
Activations Density 0.020%