INDEX
Explanations
references to New Year's celebrations and associated events
New Auto-Interp
Negative Logits
yard
-0.16
ernes
-0.15
beyond
-0.14
.te
-0.14
714
-0.13
ah
-0.13
yard
-0.13
жив
-0.13
itary
-0.13
ortal
-0.13
POSITIVE LOGITS
Eve
0.26
eve
0.23
EVE
0.20
resolutions
0.20
Resolution
0.18
eve
0.18
/start
0.17
resolution
0.17
-resolution
0.16
theless
0.16
Activations Density 0.006%