INDEX
Explanations
references to charitable actions and personal connections to communities
New Auto-Interp
Negative Logits
ortal
-0.16
llib
-0.15
anded
-0.15
322
-0.14
edio
-0.14
Metro
-0.14
riv
-0.13
PPER
-0.13
ponge
-0.13
argv
-0.13
POSITIVE LOGITS
dear
0.45
importance
0.29
important
0.28
personal
0.28
meaningful
0.27
æĦıä¹ī
0.27
important
0.27
meaning
0.26
Importance
0.26
Important
0.25
Activations Density 0.230%