INDEX
Explanations
phrases related to formal announcements or declarations
the word "For"
New Auto-Interp
Negative Logits
itiz
-0.73
crawl
-0.64
forg
-0.62
zona
-0.58
eh
-0.57
looms
-0.56
bottleneck
-0.56
jaws
-0.55
chenko
-0.55
adr
-0.54
POSITIVE LOGITS
bidden
1.52
gotten
1.50
ced
1.24
cing
1.19
give
1.15
example
1.12
bes
1.08
wards
1.06
instance
1.05
getting
1.02
Activations Density 0.061%