INDEX
Explanations
words related to astonishment or surprise
New Auto-Interp
Negative Logits
terday
-0.80
istance
-0.72
ibel
-0.72
andre
-0.69
pai
-0.69
idences
-0.68
uctor
-0.66
hip
-0.66
itism
-0.66
idential
-0.65
POSITIVE LOGITS
gers
0.86
warts
0.84
matic
0.81
asus
0.79
weed
0.79
ga
0.78
ues
0.76
arty
0.74
Champ
0.74
ogg
0.73
Activations Density 0.028%