INDEX
Explanations
phrases indicating a decision or choice
phrases indicating personal responsibility
New Auto-Interp
Negative Logits
hello
-0.78
isky
-0.74
iked
-0.70
mol
-0.67
anguages
-0.67
ense
-0.67
TEXT
-0.67
listed
-0.66
artifacts
-0.66
ãĥŁ
-0.66
POSITIVE LOGITS
discretion
1.07
whoever
0.91
imagination
0.83
interpretation
0.83
judges
0.81
policymakers
0.81
professionals
0.80
Allaah
0.77
consumers
0.74
shoulders
0.73
Activations Density 0.180%