INDEX
Explanations
mentions of assessments or evaluations
the word "in" and its contextual variations within sentences
New Auto-Interp
Negative Logits
ngth
-0.71
untled
-0.69
llular
-0.66
leep
-0.66
awei
-0.65
iris
-0.64
lashed
-0.62
sacked
-0.61
assies
-0.61
etsk
-0.60
POSITIVE LOGITS
hindsight
1.37
itself
1.28
retrospect
1.23
terms
1.04
disguise
1.00
asm
0.97
comparison
0.96
context
0.94
humane
0.93
spite
0.87
Activations Density 0.137%