INDEX
Explanations
mentions of New Year's celebrations and dates
New Auto-Interp
Negative Logits
ector
-0.16
bjerg
-0.16
VERS
-0.15
curso
-0.15
zc
-0.15
versa
-0.14
jom
-0.14
weather
-0.14
jde
-0.14
legate
-0.14
POSITIVE LOGITS
quist
0.18
ombat
0.16
adir
0.15
inki
0.15
dart
0.15
Bonus
0.15
ãĤ¤ãĥī
0.15
ramids
0.14
_simps
0.14
itez
0.14
Activations Density 0.005%