INDEX
Explanations
phrases that indicate a place or a grouping of people
phrases that emphasize membership or prevalence in a category
New Auto-Interp
Negative Logits
culosis
-0.67
accordingly
-0.66
gaze
-0.60
perpend
-0.60
hs
-0.59
dispose
-0.59
comply
-0.58
tion
-0.57
mustache
-0.57
understands
-0.55
POSITIVE LOGITS
inea
0.74
icial
0.72
elong
0.71
sted
0.67
onel
0.67
apest
0.65
odox
0.64
ãĤ¨
0.64
itol
0.64
icer
0.64
Activations Density 0.172%