INDEX
Explanations
examples and explanations in a context
phrases that introduce examples or explanations
New Auto-Interp
Negative Logits
roy
-0.78
ess
-0.70
gaard
-0.69
owed
-0.69
esses
-0.68
inate
-0.65
ige
-0.64
GM
-0.63
inev
-0.63
ND
-0.61
POSITIVE LOGITS
ierre
0.77
zech
0.69
hov
0.68
tti
0.68
trak
0.67
gans
0.66
Photographer
0.64
=#
0.64
ooters
0.63
Reflex
0.63
Activations Density 0.104%