INDEX
Explanations
phrases related to physical systems and their dynamics
New Auto-Interp
Negative Logits
Table
-0.64
anymore
-0.62
irgendwie
-0.59
voyez
-0.57
apapun
-0.57
anyway
-0.55
Anyway
-0.55
Anything
-0.55
Figure
-0.55
anything
-0.55
POSITIVE LOGITS
novel
0.73
demonstrate
0.69
Implications
0.66
demonstrates
0.64
demonstrating
0.61
illust
0.60
erstmals
0.60
implications
0.60
novel
0.57
show
0.57
Activations Density 1.353%