INDEX
Explanations
sentences or statements
punctuation and expressions of strong emotion or surprise
New Auto-Interp
Negative Logits
accomp
-0.86
rall
-0.81
challeng
-0.79
pione
-0.79
exting
-0.78
targ
-0.78
teasp
-0.77
concess
-0.77
suspic
-0.77
necess
-0.76
POSITIVE LOGITS
Because
1.44
Didn
1.43
Anyway
1.38
Sorry
1.38
There
1.37
Exactly
1.37
You
1.36
Actually
1.35
Nothing
1.34
Unless
1.33
Activations Density 0.236%