INDEX
Explanations
references to characters or entities in a narrative context
New Auto-Interp
Negative Logits
leÅŁ
-0.14
awl
-0.13
kas
-0.13
šli
-0.13
agn
-0.13
oins
-0.13
Sap
-0.13
rana
-0.13
arna
-0.13
placeholders
-0.13
POSITIVE LOGITS
aforementioned
0.26
åĪļæīį
0.20
вÑĭÑĪе
0.19
afore
0.17
ulty
0.15
mentioned
0.15
IMIT
0.15
above
0.15
IVEN
0.15
.yy
0.15
Activations Density 0.111%