INDEX
Explanations
words related to maintenance or record-keeping
New Auto-Interp
Negative Logits
NER
-0.81
mob
-0.74
sonian
-0.69
Merit
-0.68
nir
-0.65
hel
-0.65
ner
-0.65
�
-0.65
tf
-0.64
MAN
-0.63
POSITIVE LOGITS
quiet
0.73
oulos
0.72
track
0.71
secret
0.70
meticulous
0.70
silent
0.68
tabs
0.68
busy
0.67
score
0.65
calm
0.65
Activations Density 0.038%