INDEX
Explanations
pronouns and verbs related to communication
references to individuals in the context of communication
New Auto-Interp
Negative Logits
ikan
-0.64
ibal
-0.62
arious
-0.60
hift
-0.60
akia
-0.59
xtap
-0.58
inf
-0.58
ierre
-0.58
ilde
-0.57
Category
-0.56
POSITIVE LOGITS
DERR
0.92
selves
0.86
goodbye
0.85
ij士
0.75
orally
0.74
terday
0.68
daughters
0.67
farewell
0.67
privately
0.67
self
0.65
Activations Density 0.110%