INDEX
Explanations
phrases related to opinions and statements
comma-separated lists or phrases
New Auto-Interp
Negative Logits
,
-0.81
,...
-0.75
"""
-0.71
SourceFile
-0.71
-
-0.68
!,
-0.68
vale
-0.68
Previous
-0.67
Wire
-0.67
Actor
-0.67
POSITIVE LOGITS
somew
0.69
diminishing
0.57
rudimentary
0.56
stellar
0.55
tacit
0.54
nonexistent
0.54
incomplete
0.53
utenberg
0.53
ifiable
0.52
criminally
0.52
Activations Density 0.124%