INDEX
Explanations
connotations of condescension and dehumanization
New Auto-Interp
Negative Logits
Reviewer
-0.83
ansas
-0.77
«ĺ
-0.77
FORE
-0.73
STD
-0.66
IRO
-0.66
Gors
-0.66
lly
-0.65
Spit
-0.64
instein
-0.63
POSITIVE LOGITS
asking
1.20
ension
1.02
ensions
1.01
essential
0.90
ouch
0.88
multit
0.86
itude
0.83
agic
0.82
ributes
0.82
asks
0.82
Activations Density 0.006%