INDEX
Explanations
phrases related to charitable activities or political initiatives aimed at making a difference
New Auto-Interp
Negative Logits
constitu
-0.78
exha
-0.74
opausal
-0.71
discharge
-0.69
nerv
-0.68
reg
-0.66
discovery
-0.66
repetition
-0.65
flush
-0.65
occas
-0.64
POSITIVE LOGITS
away
1.12
ings
0.99
Your
0.96
aways
0.96
Yourself
0.94
Away
0.94
ers
0.92
Maker
0.87
ership
0.86
Your
0.86
Activations Density 0.164%