INDEX
Explanations
phrases related to critiques or evaluations
New Auto-Interp
Negative Logits
ngth
-0.75
lished
-0.70
thood
-0.67
apons
-0.64
interrupted
-0.64
successfully
-0.63
iencies
-0.63
enaries
-0.62
reys
-0.61
gang
-0.61
POSITIVE LOGITS
considering
1.15
huh
0.86
eh
0.76
given
0.74
Canaver
0.74
coincidence
0.74
hindsight
0.70
hypocritical
0.67
omission
0.67
understatement
0.67
Activations Density 2.003%