INDEX
Explanations
proper nouns
words that serve as introductory or transitional phrases in sentences
New Auto-Interp
Negative Logits
Mate
-0.64
lit
-0.60
âĢº
-0.59
',
-0.56
tered
-0.56
virgin
-0.56
umbered
-0.55
Rodrig
-0.55
'.
-0.55
Anon
-0.55
POSITIVE LOGITS
Else
0.80
mosp
0.76
APS
0.73
Weak
0.73
Quantity
0.70
hesda
0.70
:(
0.69
jriwal
0.67
alyst
0.66
yrinth
0.65
Activations Density 0.117%