INDEX
Explanations
comparisons between different individuals or entities, often highlighting contrasting behaviors or qualities
New Auto-Interp
Negative Logits
assi
-0.18
iasi
-0.18
.toolbox
-0.16
ASI
-0.16
ibi
-0.15
utow
-0.15
antz
-0.14
yonel
-0.14
seealso
-0.14
ãĤĤãģĨ
-0.14
POSITIVE LOGITS
respectively
0.33
respective
0.27
each
0.26
each
0.25
both
0.23
former
0.22
both
0.21
neither
0.21
Each
0.21
Both
0.21
Activations Density 0.412%