INDEX
Explanations
mentions of specific or emphasized items or concepts within a context
references to specific topics or entities
New Auto-Interp
Negative Logits
lyn
-0.76
unts
-0.74
IR
-0.69
board
-0.69
UD
-0.66
911
-0.66
USD
-0.66
NI
-0.65
li
-0.64
bane
-0.64
POSITIVE LOGITS
ties
0.94
ities
0.89
embodiments
0.82
iates
0.81
styles
0.80
batches
0.77
izations
0.77
wcs
0.77
identifiable
0.76
isations
0.76
Activations Density 0.014%