INDEX
Explanations
phrases related to criminal activities or controversies
words associated with significant events or crises
New Auto-Interp
Negative Logits
ertain
-0.60
ibaba
-0.60
nosis
-0.57
phas
-0.56
erent
-0.56
proble
-0.55
eree
-0.55
wills
-0.53
uniqueness
-0.50
deserts
-0.50
POSITIVE LOGITS
shortly
1.10
yesterday
1.03
,[
1.00
last
0.96
earlier
0.89
Wednesday
0.88
Monday
0.87
Tuesday
0.87
after
0.86
Thursday
0.86
Activations Density 0.663%