INDEX
Explanations
sentences emphasizing a high degree of something
New Auto-Interp
Negative Logits
ensis
-0.75
olor
-0.73
osal
-0.71
onis
-0.71
igi
-0.68
ourses
-0.66
sburgh
-0.66
adelphia
-0.65
amel
-0.64
iture
-0.63
POSITIVE LOGITS
important
0.93
much
0.92
nice
0.90
informative
0.89
difficult
0.89
rare
0.89
similar
0.88
interesting
0.88
unlikely
0.88
exciting
0.86
Activations Density 0.305%