INDEX
Explanations
truths and their complexities within narratives
New Auto-Interp
Negative Logits
BorderRadius
-0.65
cluye
-0.63
noft
-0.61
grew
-0.59
fusca
-0.59
supérieurs
-0.58
pères
-0.57
schuldig
-0.57
adquis
-0.56
fizer
-0.55
POSITIVE LOGITS
spoken
0.95
played
0.83
argued
0.82
fought
0.82
talked
0.81
shouted
0.81
uttered
0.80
told
0.80
whispered
0.80
danced
0.79
Activations Density 0.533%