INDEX
Explanations
phrases indicating transitions or movements, particularly upward or in a positive direction
New Auto-Interp
Negative Logits
Citation
-0.71
cases
-0.54
boycot
-0.51
cause
-0.48
isphere
-0.48
KS
-0.48
terday
-0.48
2024
-0.47
inclusion
-0.47
anian
-0.47
POSITIVE LOGITS
ruff
0.61
ensed
0.60
senal
0.58
shenan
0.58
gettable
0.53
razy
0.52
proverbial
0.51
snipp
0.51
uncanny
0.50
tricks
0.50
Activations Density 0.215%