INDEX
Explanations
mentions of concerts and live performances
New Auto-Interp
Negative Logits
vit
-0.17
erts
-0.16
ocene
-0.16
itarian
-0.15
ert
-0.15
laus
-0.15
icens
-0.15
consistent
-0.14
bert
-0.14
itar
-0.14
POSITIVE LOGITS
aal
0.16
Ø®ÛĮ
0.15
oa
0.15
colo
0.15
razione
0.14
IVAL
0.14
elpers
0.14
-redux
0.14
uhn
0.14
wise
0.14
Activations Density 0.006%