INDEX
Explanations
specific years or dates mentioned in the text
New Auto-Interp
Negative Logits
stead
-0.64
malink
-0.62
monop
-0.61
ACTION
-0.60
amb
-0.59
Strongh
-0.59
hypocr
-0.58
metaph
-0.58
pressed
-0.56
plural
-0.56
POSITIVE LOGITS
onwards
1.08
onward
0.87
å¹
0.85
-'
0.81
iversary
0.79
20439
0.72
â̲
0.72
ãģ®éŃĶ
0.67
berries
0.66
urst
0.65
Activations Density 0.101%