INDEX
Explanations
phrases indicating the exclusion or dismissal of certain possibilities
phrases indicating the denial or dismissal of possibilities
New Auto-Interp
Negative Logits
antry
-0.77
conflic
-0.75
inki
-0.73
itialized
-0.72
elf
-0.72
iosyn
-0.71
awar
-0.70
omsky
-0.69
resil
-0.69
feel
-0.67
POSITIVE LOGITS
posts
0.83
enance
0.71
nels
0.69
anything
0.68
pedestrians
0.67
stadt
0.64
negatives
0.64
dissenting
0.63
Sussex
0.63
Ernst
0.62
Activations Density 0.031%