INDEX
Explanations
comparisons or similarities
phrases that express similarity or comparisons
New Auto-Interp
Negative Logits
igion
-0.83
inion
-0.83
alez
-0.76
Language
-0.76
abases
-0.74
IAL
-0.74
ourse
-0.74
chin
-0.72
helicop
-0.72
aft
-0.69
POSITIVE LOGITS
lier
0.92
liest
0.88
lihood
0.82
fir
0.67
crap
0.66
fun
0.65
filler
0.64
fireworks
0.64
pus
0.63
peas
0.63
Activations Density 0.023%