INDEX
Explanations
instances where something is considered problematic or controversial
phrases related to disagreement or opposition
New Auto-Interp
Negative Logits
interstitial
-0.73
ioned
-0.73
icter
-0.72
exting
-0.68
weeney
-0.68
wich
-0.67
hei
-0.66
wake
-0.66
ocene
-0.64
ewater
-0.64
POSITIVE LOGITS
nor
0.93
anybody
0.92
anyone
0.80
anymore
0.73
either
0.72
anything
0.70
Explorer
0.67
enjoyment
0.65
or
0.64
temptation
0.63
Activations Density 0.275%