INDEX
Explanations
references to specific names and titles related to cultural works
New Auto-Interp
Negative Logits
urga
-0.19
urse
-0.16
iets
-0.15
mbox
-0.14
ousse
-0.14
het
-0.14
okes
-0.14
urses
-0.13
habi
-0.13
вк
-0.13
POSITIVE LOGITS
arella
0.16
αÏģα
0.14
PRO
0.14
åĸ
0.14
amba
0.14
brom
0.14
[Index
0.14
ynos
0.14
iang
0.14
_INTERFACE
0.14
Activations Density 0.011%