INDEX
Explanations
phrases related to deserving recognition or attention
New Auto-Interp
Negative Logits
unlawfully
-0.68
abs
-0.66
iple
-0.64
ricks
-0.64
stead
-0.64
rix
-0.63
misled
-0.62
wig
-0.62
isms
-0.60
ll
-0.60
POSITIVE LOGITS
attention
1.00
ENTION
0.99
consideration
0.98
inclusion
0.92
scrutiny
0.89
emulation
0.89
contemplation
0.82
avorite
0.80
scorn
0.78
Attention
0.78
Activations Density 0.107%