INDEX
Explanations
phrases that indicate a contrast or exception
evaluations and opinions about individuals or circumstances
New Auto-Interp
Negative Logits
hiba
-0.65
sqor
-0.64
cember
-0.62
Uriel
-0.62
Ezek
-0.59
roxy
-0.58
glimps
-0.58
Quote
-0.57
backdrop
-0.57
pez
-0.56
POSITIVE LOGITS
necessarily
1.26
anymore
1.24
nor
1.20
anything
1.18
nor
1.11
any
1.10
EVER
1.05
slightest
1.00
anything
0.99
ANY
0.96
Activations Density 0.237%