INDEX
Explanations
expressions of dialogue, particularly those indicating emphasis or strong feelings
speak about individuals or groups in a derogatory or condescending manner
New Auto-Interp
Negative Logits
Seym
-0.75
mathemat
-0.69
tabloid
-0.68
limb
-0.68
ivory
-0.68
seiz
-0.67
Gardens
-0.67
amusement
-0.66
metic
-0.66
tasting
-0.66
POSITIVE LOGITS
ï¸ı
1.06
rd
0.99
lean
0.95
resent
0.93
deg
0.89
vernment
0.89
ï¸
0.85
PB
0.84
audi
0.84
ãĥĥãĥī
0.84
Activations Density 0.030%