INDEX
Explanations
contractions of words with specific characters such as 'n't'
instances of the word "wouldn't" in various contexts
New Auto-Interp
Negative Logits
æ³
-0.68
ARM
-0.68
story
-0.66
ULT
-0.63
PI
-0.62
Case
-0.62
PU
-0.61
ocal
-0.60
Adv
-0.60
agency
-0.60
POSITIVE LOGITS
't
1.09
ģĸ
0.82
never
0.79
terness
0.78
atically
0.76
geon
0.76
¹
0.74
surely
0.73
¨
0.73
±
0.73
Activations Density 0.009%