INDEX
Explanations
sentences or phrases expressing personal opinions
expressions of doubt or skepticism
New Auto-Interp
Negative Logits
resultant
-0.64
.'"
-0.62
,'"
-0.61
!'"
-0.59
motif
-0.59
',"
-0.59
!--
-0.58
invention
-0.57
'."
-0.57
"""
-0.56
POSITIVE LOGITS
renheit
0.70
Lisa
0.66
cler
0.65
anamo
0.63
HAM
0.60
leep
0.60
HAEL
0.60
wives
0.60
Fle
0.59
Roads
0.59
Activations Density 1.341%