INDEX
Explanations
references to injuries and accidents
references to cultural or artistic critiques
New Auto-Interp
Negative Logits
[/
-0.82
shown
-0.82
utilizing
-0.74
comprised
-0.73
¶ħ
-0.72
prior
-0.72
âμ
-0.71
foreseen
-0.69
approximately
-0.68
approximately
-0.68
POSITIVE LOGITS
eds
0.81
quarrel
0.70
medicines
0.70
bribe
0.68
supermarkets
0.67
Newsletter
0.67
starve
0.66
complains
0.65
beware
0.65
piety
0.63
Activations Density 1.637%