INDEX
Explanations
references to fire-related topics or terms
New Auto-Interp
Negative Logits
zug
-0.18
hlas
-0.16
rej
-0.16
onet
-0.15
jen
-0.15
sk
-0.15
keit
-0.14
rij
-0.14
hari
-0.14
hors
-0.14
POSITIVE LOGITS
nze
0.25
works
0.24
places
0.22
bird
0.21
walls
0.21
ball
0.21
alarm
0.20
work
0.20
fly
0.20
brand
0.20
Activations Density 0.016%