INDEX
Explanations
reassuring statements to alleviate worries or fears
New Auto-Interp
Negative Logits
artney
-0.76
avor
-0.73
ourses
-0.66
progressively
-0.65
iband
-0.63
decom
-0.60
urate
-0.60
avorite
-0.60
arching
-0.60
olid
-0.59
POSITIVE LOGITS
!
1.00
!:
0.98
!]
0.90
ladies
0.88
!),
0.87
!).
0.87
!)
0.83
!,
0.81
!!
0.80
folks
0.80
Activations Density 0.110%