INDEX
Explanations
words related to opinions, attitudes, or judgments
terms related to perception and how people interpret actions or events
New Auto-Interp
Negative Logits
Clicker
-0.65
FTWARE
-0.62
Frenzy
-0.59
ften
-0.53
Waves
-0.53
andel
-0.52
Neural
-0.52
Peb
-0.51
Sebast
-0.51
REM
-0.51
POSITIVE LOGITS
favorably
1.12
skept
1.09
as
1.01
negatively
1.00
positively
0.89
unfairly
0.89
differently
0.87
harshly
0.84
tical
0.81
derog
0.80
Activations Density 0.099%