INDEX
Explanations
phrases introducing examples or illustrating points
instances of illustrative examples or case studies
New Auto-Interp
Negative Logits
inev
-0.85
ocr
-0.81
ess
-0.73
orate
-0.73
roy
-0.71
esses
-0.69
livion
-0.69
alysed
-0.68
ocracy
-0.68
ima
-0.65
POSITIVE LOGITS
imagine
0.72
suppose
0.71
=#
0.65
hypot
0.64
Sergio
0.64
aeper
0.62
Buff
0.61
Brief
0.60
ooters
0.59
dinand
0.59
Activations Density 0.125%