INDEX
Explanations
adjectives and adverbs describing characteristics or qualities
phrases indicating the existence or status of something, particularly in a societal context
New Auto-Interp
Negative Logits
lished
-0.79
osate
-0.69
culosis
-0.66
alus
-0.65
llah
-0.64
oice
-0.63
oire
-0.63
lag
-0.63
ij士
-0.63
keye
-0.62
POSITIVE LOGITS
selves
0.92
MpServer
0.88
themselves
0.88
jointly
0.73
selves
0.72
atically
0.70
supposed
0.70
outnumbered
0.70
able
0.70
unmarked
0.69
Activations Density 0.357%