INDEX
Explanations
phrases containing descriptions or attributions
phrases that involve descriptions or characterizations of people, events, or concepts
New Auto-Interp
Negative Logits
ettlement
-0.61
abiding
-0.59
Zone
-0.59
ipeg
-0.57
cise
-0.56
acion
-0.54
reproduce
-0.53
olar
-0.53
Lor
-0.53
tein
-0.53
POSITIVE LOGITS
favorably
0.90
skept
0.87
unfairly
0.86
sarcast
0.71
okingly
0.70
by
0.69
errone
0.69
harshly
0.68
negatively
0.66
ologically
0.66
Activations Density 0.153%