INDEX
Explanations
statements of blame or self-recrimination
New Auto-Interp
Negative Logits
oprot
-0.88
Iné
-0.68
―――――
-0.66
NUMX
-0.66
незавершена
-0.65
Houſe
-0.65
cknow
-0.65
Monfieur
-0.65
NDEBUG
-0.65
itſelf
-0.64
POSITIVE LOGITS
!
0.57
!!!
0.55
!!
0.53
!!!!
0.52
!”
0.50
!"
0.48
!
0.47
ufen
0.44
inspiração
0.44
C
0.44
Activations Density 0.193%