INDEX
Explanations
references to interpersonal relationships and characters involved in dialogues
New Auto-Interp
Negative Logits
atham
-0.18
Zimmerman
-0.15
pin
-0.15
tero
-0.15
Bowling
-0.15
neutral
-0.15
vet
-0.14
relations
-0.14
l
-0.14
ior
-0.14
POSITIVE LOGITS
FORCE
0.16
.Layer
0.15
ustr
0.15
åĿĽ
0.15
vertiser
0.14
orns
0.14
-sama
0.14
fel
0.14
felony
0.14
ILD
0.14
Activations Density 0.094%