INDEX
Explanations
mentions of making a positive impact or difference
references to making a positive impact or difference
New Auto-Interp
Negative Logits
oked
-0.84
IDES
-0.70
©¶æ
-0.69
cephal
-0.69
stra
-0.69
earchers
-0.68
urat
-0.68
inges
-0.66
Strauss
-0.64
dden
-0.63
POSITIVE LOGITS
Shape
0.70
EFF
0.67
IAL
0.65
ļéĨĴ
0.64
whatsoever
0.63
wherever
0.61
ileaks
0.60
hole
0.60
hift
0.60
Marginal
0.59
Activations Density 0.021%