INDEX
Explanations
negative or skeptical phrases about past actions and decisions
New Auto-Interp
Negative Logits
queſta
-0.77
témoig
-0.75
rungsseite
-0.74
ſind
-0.73
Efq
-0.70
dieſe
-0.68
Anſ
-0.68
Weiſe
-0.67
MemoryWarning
-0.67
iſchen
-0.66
POSITIVE LOGITS
一
0.36
it
0.34
T
0.34
phase
0.30
ends
0.30
seems
0.29
finit
0.29
pie
0.28
appears
0.28
affect
0.28
Activations Density 0.233%