INDEX
Explanations
phrases related to lack of change or progress
phrases indicating the presence of signs or indicators of various conditions or situations
New Auto-Interp
Negative Logits
ahime
-0.78
halla
-0.68
ISH
-0.66
adish
-0.63
adelphia
-0.62
ARY
-0.61
lda
-0.60
rior
-0.60
ISO
-0.60
ategory
-0.59
POSITIVE LOGITS
Signs
0.98
signs
0.95
posts
0.87
atories
0.86
igmatic
0.83
ifact
0.78
emic
0.77
matic
0.75
oresc
0.75
trump
0.72
Activations Density 0.017%