INDEX
Explanations
references to specific individuals and their roles or actions
New Auto-Interp
Negative Logits
alar
-0.17
nell
-0.17
argent
-0.15
regon
-0.15
ør
-0.14
669
-0.14
adi
-0.14
Mund
-0.14
iler
-0.14
avia
-0.14
POSITIVE LOGITS
arb
0.15
AXB
0.15
ToMany
0.14
EMA
0.14
onna
0.14
rung
0.14
isay
0.14
.inline
0.14
CTX
0.14
_simps
0.14
Activations Density 0.032%