INDEX
Explanations
words related to professions and work environments
phrases related to law and authority figures
New Auto-Interp
Negative Logits
lamented
-0.61
®
-0.60
CLS
-0.60
arnaev
-0.59
eagerly
-0.59
mere
-0.58
famed
-0.58
Released
-0.58
"#
-0.57
surprisingly
-0.57
POSITIVE LOGITS
[
1.26
['
1.25
everybody
1.10
â̦"
1.08
..."
1.07
,"
1.07
somebody
1.06
.''
1.06
."
1.05
,'"
1.05
Activations Density 1.337%