INDEX
Explanations
instances where someone or something is responsible for a specific action or outcome
phrases indicating accountability or responsibility
New Auto-Interp
Negative Logits
edin
-0.85
Sport
-0.81
ellen
-0.78
notations
-0.75
opol
-0.74
alk
-0.73
lime
-0.73
nets
-0.72
fair
-0.72
Brow
-0.70
POSITIVE LOGITS
overseeing
1.04
maintaining
1.01
safegu
0.99
regulating
0.99
ensuring
0.96
protecting
0.95
preserving
0.90
upholding
0.90
constructing
0.89
upkeep
0.88
Activations Density 0.069%