INDEX
Explanations
phrases related to actions or decisions with significant impact and consequences
phrases related to regulatory policies and their implications
New Auto-Interp
Negative Logits
Tud
-0.64
Kardash
-0.62
aer
-0.62
Vers
-0.61
Roose
-0.61
didnt
-0.61
Benn
-0.60
Aer
-0.59
Awakens
-0.59
Dunk
-0.56
POSITIVE LOGITS
onite
0.77
lie
0.70
senal
0.69
combe
0.68
etheless
0.68
Ͻ
0.66
osate
0.66
-+
0.65
JECT
0.65
ifully
0.64
Activations Density 1.263%