INDEX
Explanations
expressions of consistency and reliability in various contexts
New Auto-Interp
Negative Logits
rice
-0.16
ough
-0.16
oise
-0.15
кав
-0.15
zen
-0.15
icina
-0.15
scribe
-0.14
aver
-0.14
oni
-0.14
lum
-0.14
POSITIVE LOGITS
ently
0.19
ably
0.17
aye
0.16
bred
0.16
inconsistent
0.16
ively
0.16
antly
0.15
ly
0.15
across
0.15
throughout
0.14
Activations Density 0.030%