INDEX
Explanations
words or phrases expressing admiration or negative sentiments towards individuals
adoration, abhorrence, adoptive transfer
New Auto-Interp
Negative Logits
Startup
-0.47
policia
-0.44
promp
-0.42
Result
-0.42
punt
-0.42
rheumat
-0.42
Marke
-0.42
definit
-0.41
Chemin
-0.41
FFIC
-0.41
POSITIVE LOGITS
adore
1.53
adored
1.45
adoration
1.13
adoro
1.05
adore
1.04
ador
0.86
Adorable
0.76
älskar
0.75
adorable
0.73
paixão
0.70
Activations Density 0.002%