INDEX
Explanations
phrases related to significant historical events and their impacts
New Auto-Interp
Negative Logits
studied
-0.50
visited
-0.49
Studied
-0.47
studied
-0.46
watched
-0.45
Observed
-0.43
visited
-0.43
researched
-0.42
observed
-0.42
watched
-0.41
POSITIVE LOGITS
led
1.59
enabled
1.49
helped
1.48
prompted
1.39
caused
1.36
helped
1.31
allowed
1.28
brought
1.27
prevented
1.22
spurred
1.20
Activations Density 1.535%