INDEX
Explanations
phrases indicating an impression or perception
instances of the word "impression" and its various contexts
New Auto-Interp
Negative Logits
annex
-0.68
annis
-0.68
foreseen
-0.67
fighting
-0.65
cross
-0.65
challeng
-0.64
yne
-0.64
vez
-0.63
ccording
-0.63
cale
-0.62
POSITIVE LOGITS
impression
1.12
impressions
1.00
uren
0.84
eful
0.81
eless
0.76
Poc
0.72
perceptions
0.71
scars
0.69
uated
0.68
able
0.68
Activations Density 0.017%