INDEX
Explanations
words related to negative or criminal activities, such as trouble, jail, arrested, or bankrupt
phrases related to legal consequences and punishments
New Auto-Interp
Negative Logits
ript
-0.66
refresh
-0.58
orney
-0.57
supplemented
-0.56
RFC
-0.55
quart
-0.54
curated
-0.54
visual
-0.54
BST
-0.53
inaug
-0.53
POSITIVE LOGITS
persecution
0.69
ogenic
0.68
infring
0.68
offenders
0.65
martyr
0.64
havoc
0.64
victims
0.63
whistleblowers
0.63
criminals
0.63
Prevention
0.62
Activations Density 0.568%