INDEX
Explanations
references to safety and well-being for individuals and communities
New Auto-Interp
Negative Logits
ehr
-0.16
ington
-0.16
avicon
-0.14
ampo
-0.14
ox
-0.14
ow
-0.14
abant
-0.14
Äĵ
-0.14
\CMS
-0.13
agnet
-0.13
POSITIVE LOGITS
theirs
0.54
ours
0.47
hers
0.46
mine
0.43
yours
0.40
Mine
0.35
mine
0.35
Mine
0.35
ones
0.31
others
0.24
Activations Density 0.170%