INDEX
Explanations
references to various forms of media
New Auto-Interp
Negative Logits
gs
-0.20
so
-0.18
arna
-0.18
ma
-0.17
ment
-0.17
bage
-0.17
bas
-0.16
ning
-0.16
building
-0.16
bing
-0.16
POSITIVE LOGITS
eval
0.28
outlets
0.21
outlet
0.18
/media
0.17
ieval
0.17
fak
0.17
tmpl
0.15
relations
0.15
arda
0.15
zioni
0.15
Activations Density 0.019%