INDEX
Explanations
words related to harmful actions or events and their consequences
phrases related to crime and its consequences
New Auto-Interp
Negative Logits
================================
-0.61
Lets
-0.59
Majesty
-0.59
advoc
-0.57
¯¯¯¯¯¯¯¯
-0.57
idth
-0.55
Whilst
-0.53
BuyableInstoreAndOnline
-0.53
Bachelor
-0.52
Tuls
-0.52
POSITIVE LOGITS
afterward
1.28
afterwards
1.01
elsewhere
0.98
later
0.93
thereafter
0.82
earlier
0.79
abroad
0.79
nearby
0.78
Enlarge
0.77
beforehand
0.71
Activations Density 0.617%