INDEX
Explanations
references to community initiatives or societal issues that involve giving back or helping others
New Auto-Interp
Negative Logits
clean
-0.61
benefited
-0.59
erased
-0.59
ancial
-0.58
itivity
-0.57
spoiled
-0.57
FTWARE
-0.56
aceutical
-0.56
condition
-0.55
alpha
-0.55
POSITIVE LOGITS
mosp
1.21
least
1.17
hens
1.08
kinson
0.98
times
0.90
letico
0.90
omic
0.89
las
0.88
ention
0.84
dusk
0.79
Activations Density 0.062%