INDEX
Explanations
phrases and constructs indicating contrast or contradiction
New Auto-Interp
Negative Logits
DIE
-0.17
ecies
-0.15
że
-0.15
entic
-0.14
Pend
-0.14
rouch
-0.14
Die
-0.14
ismu
-0.14
gings
-0.14
gens
-0.14
POSITIVE LOGITS
FetchRequest
0.15
cher
0.15
ikip
0.14
342
0.14
437
0.14
/Object
0.14
ÅŁi
0.13
lient
0.13
853
0.13
OOD
0.13
Activations Density 0.239%