INDEX
Explanations
repetitive phrases indicating a minimum or baseline condition
New Auto-Interp
Negative Logits
orean
-0.65
audition
-0.63
�
-0.62
bilingual
-0.62
appeal
-0.61
compete
-0.61
competition
-0.61
boxed
-0.61
者
-0.59
annex
-0.57
POSITIVE LOGITS
nil
0.88
acea
0.83
fortunately
0.80
ravis
0.79
VIDIA
0.78
last
0.77
heres
0.76
iatus
0.75
thens
0.74
poses
0.73
Activations Density 0.023%