INDEX
Explanations
references to the "Avengers" franchise in various contexts
New Auto-Interp
Negative Logits
itten
-0.17
oler
-0.17
away
-0.17
okud
-0.16
unte
-0.15
éĥ
-0.15
igated
-0.14
indow
-0.14
iele
-0.14
onet
-0.14
POSITIVE LOGITS
ktop
0.16
oksen
0.16
ull
0.15
hes
0.15
anch
0.15
eye
0.14
eshire
0.14
ullo
0.14
roll
0.14
744
0.14
Activations Density 0.003%