INDEX
Explanations
instances where something is being treated or perceived differently
situations or concepts that involve contrasting treatment or approaches
New Auto-Interp
Negative Logits
Encyclopedia
-0.75
Zion
-0.69
Dmitry
-0.67
á
-0.66
Dou
-0.65
Plate
-0.65
O
-0.64
Upton
-0.63
advertising
-0.63
Fritz
-0.62
POSITIVE LOGITS
iating
1.03
etheless
0.98
wcs
0.97
iates
0.94
differently
0.93
differentiated
0.89
iator
0.86
styles
0.85
different
0.84
minded
0.83
Activations Density 0.005%