INDEX
Explanations
phrases related to challenges or controversial issues
strong statements or claims in a discussion
New Auto-Interp
Negative Logits
ials
-0.69
artments
-0.68
yss
-0.66
ancers
-0.66
gments
-0.63
ants
-0.61
bats
-0.61
ummies
-0.61
ctors
-0.60
istar
-0.60
POSITIVE LOGITS
nowhere
0.82
ģĸ
0.80
sorely
0.79
sole
0.75
neither
0.75
borne
0.72
underpin
0.71
appl
0.71
antit
0.69
nothing
0.68
Activations Density 0.314%