INDEX
Explanations
acts of kindness and charitable gestures
references to generosity and altruism
New Auto-Interp
Negative Logits
Bam
-0.73
Showdown
-0.69
Ys
-0.69
fund
-0.67
INGTON
-0.66
ciation
-0.65
Sphere
-0.64
Milky
-0.64
ASAP
-0.63
Kob
-0.61
POSITIVE LOGITS
ttes
0.84
depending
0.82
pas
0.79
times
0.74
oots
0.73
entimes
0.71
enos
0.70
incorrectly
0.70
pmwiki
0.70
underestimated
0.68
Activations Density 0.450%