INDEX
Explanations
instances where caring or lack of caring is expressed regarding different subjects
phrases expressing a lack of concern for various issues
New Auto-Interp
Negative Logits
gallery
-0.78
nown
-0.70
EStreamFrame
-0.70
oda
-0.70
BuyableInstoreAndOnline
-0.69
Cosponsors
-0.68
redits
-0.68
Lay
-0.68
omsky
-0.68
ammy
-0.67
POSITIVE LOGITS
preserving
0.94
respecting
0.91
whether
0.79
protecting
0.77
fairness
0.77
getting
0.76
maximizing
0.75
improving
0.74
upholding
0.73
optimizing
0.73
Activations Density 0.034%