INDEX
Explanations
phrases related to accountability and personal responsibility in social and political contexts
New Auto-Interp
Negative Logits
wolf
-0.15
anj
-0.15
ablo
-0.14
äs
-0.14
olders
-0.14
สà¸ķ
-0.14
ood
-0.14
villa
-0.14
Hospitality
-0.13
ork
-0.13
POSITIVE LOGITS
nor
0.19
usta
0.17
ushi
0.17
egral
0.15
nor
0.15
amm
0.15
.Framework
0.15
Greenwood
0.14
ÄĽ
0.14
кÑĤа
0.14
Activations Density 0.287%