INDEX
Explanations
mentions of legal or political figures
non-functional or low-context phrases in a text
New Auto-Interp
Negative Logits
eleph
-0.82
mutually
-0.81
disciplines
-0.77
synerg
-0.77
actively
-0.76
symb
-0.76
reflex
-0.76
endeav
-0.76
favor
-0.75
casc
-0.75
POSITIVE LOGITS
Copyright
1.52
They
1.44
Advertisement
1.41
Officials
1.39
Topics
1.38
Sources
1.37
It
1.35
According
1.35
Both
1.35
Newsletter
1.33
Activations Density 0.561%