INDEX
Explanations
words or phrases enclosed in quotation marks
quotation marks and words associated with direct speech or quotes
New Auto-Interp
Negative Logits
killers
-0.81
hypothal
-0.73
adjud
-0.73
buoy
-0.71
tabloid
-0.70
brittle
-0.70
acidic
-0.70
HDL
-0.70
overcrowd
-0.70
devastated
-0.70
POSITIVE LOGITS
true
1.21
false
1.17
yes
1.11
ultimate
1.03
nothing
1.03
Alice
1.02
you
1.02
double
1.01
every
1.01
Yes
1.01
Activations Density 0.095%