INDEX
Explanations
relationships and interactions among characters in social situations
New Auto-Interp
Negative Logits
rea
-0.15
ãģĴ
-0.15
cmp
-0.14
NSS
-0.14
uj
-0.14
aut
-0.14
uci
-0.14
еÑĢж
-0.14
Reyes
-0.13
ila
-0.13
POSITIVE LOGITS
instead
0.23
instead
0.21
rather
0.20
alone
0.19
alone
0.19
rather
0.18
Instead
0.18
Ital
0.18
Alone
0.17
Rather
0.16
Activations Density 0.232%