INDEX
Explanations
references to specific individuals or entities, particularly names with the prefix "Go" or "Gö"
New Auto-Interp
Negative Logits
sov
-0.16
upo
-0.16
ced
-0.15
enti
-0.15
enser
-0.15
antly
-0.14
omm
-0.14
vice
-0.14
nton
-0.14
erer
-0.14
POSITIVE LOGITS
VERN
0.25
/fwlink
0.22
ÅĽcie
0.18
ebb
0.17
ethe
0.17
ekli
0.17
SSIP
0.16
mez
0.15
zilla
0.15
ogle
0.15
Activations Density 0.029%