INDEX
Explanations
references to specific events, actions, or statements in narratives
New Auto-Interp
Negative Logits
ſtate
-0.69
houſe
-0.68
CWE
-0.65
Chriftian
-0.63
Houſe
-0.62
itſelf
-0.61
DMETHOD
-0.60
ſelf
-0.60
ſche
-0.60
purpoſe
-0.60
POSITIVE LOGITS
went
0.56
immediately
0.56
let
0.52
promptly
0.52
不去
0.51
go
0.51
imme
0.49
proceeded
0.49
そのまま
0.47
Ignore
0.47
Activations Density 0.409%