INDEX
Explanations
phrases related to correctness, accuracy, or being precise
instances of the word "right" in various contexts
New Auto-Interp
Negative Logits
ĸļ
-0.81
arettes
-0.69
ipation
-0.66
riber
-0.65
ains
-0.64
Bei
-0.63
gat
-0.63
ulz
-0.62
arette
-0.62
irl
-0.62
POSITIVE LOGITS
eous
1.16
shore
0.80
wing
0.77
winger
0.75
ness
0.69
smack
0.69
ocrin
0.68
headed
0.67
ened
0.67
haven
0.67
Activations Density 0.047%