INDEX
Explanations
verbs related to perception like "looked", "sounded", and "seemed"
expressions of perception or opinion related to situations or events
New Auto-Interp
Negative Logits
por
-0.85
cedented
-0.83
opus
-0.77
veland
-0.75
enta
-0.74
eston
-0.70
yet
-0.70
arta
-0.69
enegger
-0.66
currently
-0.66
POSITIVE LOGITS
DEV
0.80
initially
0.73
originally
0.71
beforehand
0.66
Soviet
0.66
wolves
0.65
earlier
0.65
tremend
0.62
unsuccessfully
0.61
behavi
0.60
Activations Density 0.697%