INDEX
Explanations
instances where something is being evaluated or judged
phrases that indicate societal judgments or evaluations
New Auto-Interp
Negative Logits
Ring
-0.72
Jet
-0.71
inas
-0.70
Lawn
-0.67
Planes
-0.67
Observatory
-0.64
Rhythm
-0.62
interpreter
-0.60
Alley
-0.60
vocals
-0.60
POSITIVE LOGITS
considered
0.99
aimon
0.88
consider
0.87
ilitarian
0.86
phas
0.86
favorably
0.80
agine
0.80
considers
0.77
onyms
0.77
erala
0.75
Activations Density 0.014%