INDEX
Explanations
specific names or terms
specific names and terms associated with entities or concepts in a given context
New Auto-Interp
Negative Logits
Advoc
-0.83
prost
-0.80
ATT
-0.75
advoc
-0.73
Akira
-0.73
Dynam
-0.72
sacks
-0.72
Oscars
-0.72
ADV
-0.72
ISS
-0.72
POSITIVE LOGITS
ne
1.29
te
1.23
me
1.16
ple
1.07
ffe
1.05
pine
1.05
le
1.02
mes
1.02
ten
1.00
onse
0.97
Activations Density 0.294%