INDEX
Explanations
instances of the word "news" in various contexts
New Auto-Interp
Negative Logits
ci
-0.19
ndon
-0.18
ноз
-0.18
neau
-0.17
ex
-0.16
c
-0.15
ransition
-0.15
vt
-0.15
zelf
-0.15
FLAGS
-0.14
POSITIVE LOGITS
letters
0.20
rp
0.19
lever
0.16
rising
0.16
flix
0.16
oleÄį
0.16
nika
0.15
reader
0.15
lobber
0.15
Č
0.15
Activations Density 0.034%