INDEX
Explanations
time expressions written in numerical format
punctuation marks and numerical values
New Auto-Interp
Negative Logits
parach
-0.65
paranormal
-0.60
mixer
-0.59
interf
-0.58
omn
-0.58
iceberg
-0.58
insurrection
-0.57
imperson
-0.55
antidote
-0.55
investigative
-0.54
POSITIVE LOGITS
06
1.51
08
1.51
05
1.50
09
1.47
07
1.43
00
1.42
04
1.42
02
1.41
03
1.38
95
1.37
Activations Density 0.090%