INDEX
Explanations
references to political or social criticism regarding specific groups or situations
New Auto-Interp
Negative Logits
myſelf
-0.94
MigrationBuilder
-0.90
itſelf
-0.89
MLLoader
-0.85
disambiguazione
-0.81
houſe
-0.76
ModelExpression
-0.76
iſt
-0.76
InjectAttribute
-0.75
pleaſure
-0.74
POSITIVE LOGITS
idiotic
0.60
https
0.59
propaganda
0.58
fascist
0.57
moron
0.57
incompetent
0.56
😡
0.56
morons
0.55
stupidity
0.55
https
0.55
Activations Density 1.458%