INDEX
Explanations
dialogue and interactions between characters
New Auto-Interp
Negative Logits
ê¸ī
-0.16
rett
-0.15
intrinsic
-0.15
uco
-0.14
ertia
-0.14
aná
-0.14
StrictEqual
-0.14
struments
-0.14
egret
-0.14
paris
-0.14
POSITIVE LOGITS
Pra
0.16
iness
0.16
Phelps
0.15
lug
0.15
Som
0.14
actable
0.14
Hoy
0.14
thal
0.14
Civil
0.14
Lewis
0.14
Activations Density 0.560%