INDEX
Explanations
phrases related to judgments or assessments of situations or conditions
expressions of perception or opinion
New Auto-Interp
Negative Logits
arta
-0.85
been
-0.80
tan
-0.75
heed
-0.74
por
-0.72
wall
-0.70
ogie
-0.66
Yourself
-0.66
veland
-0.66
coin
-0.65
POSITIVE LOGITS
initially
0.75
abruptly
0.69
tremend
0.69
originally
0.69
nesday
0.68
seism
0.68
hes
0.65
hers
0.63
noticeably
0.63
briefly
0.62
Activations Density 0.428%