INDEX
Explanations
mentions of individuals or groups related to news reporting or investigations
instances of the prefix "rep" and variations of it
New Auto-Interp
Negative Logits
plausible
-0.59
congr
-0.57
deceptive
-0.56
accur
-0.55
explicit
-0.55
daring
-0.55
disturbing
-0.54
tantal
-0.53
apt
-0.53
generously
-0.53
POSITIVE LOGITS
icity
0.81
ications
0.78
lain
0.76
iates
0.71
igham
0.71
emort
0.70
swick
0.69
icator
0.67
ient
0.67
agues
0.67
Activations Density 0.157%