INDEX
Explanations
the word "are" repeated multiple times
the word "are" in various contexts
New Auto-Interp
Negative Logits
OOL
-0.61
Nev
-0.60
uration
-0.60
erguson
-0.59
shape
-0.58
omez
-0.58
allery
-0.58
inosaur
-0.58
ulates
-0.58
ues
-0.57
POSITIVE LOGITS
nce
1.05
nces
1.00
tsky
0.90
nes
0.85
than
0.84
nt
0.83
atra
0.82
edia
0.81
lli
0.81
tto
0.81
Activations Density 0.014%