INDEX
Explanations
connections and interactions between people
New Auto-Interp
Negative Logits
ania
-0.16
ůž
-0.16
udd
-0.14
ANNOT
-0.14
âĸ²
-0.13
ereotype
-0.13
quest
-0.13
ovit
-0.13
urb
-0.13
ëħĢ
-0.13
POSITIVE LOGITS
about
0.53
about
0.41
About
0.37
About
0.37
_about
0.36
ABOUT
0.36
-about
0.35
tentang
0.33
åħ³äºİ
0.32
.about
0.26
Activations Density 0.034%