INDEX
Explanations
criticisms and opinions about people or entities
expressions of criticism and evaluation
New Auto-Interp
Negative Logits
ajo
-0.64
claimer
-0.62
ipeg
-0.60
planes
-0.59
abiding
-0.57
CLS
-0.55
Milky
-0.54
Lies
-0.53
tein
-0.53
ent
-0.53
POSITIVE LOGITS
favorably
0.85
skept
0.83
psychiat
0.78
maxwell
0.77
}}}
0.70
unfairly
0.70
quet
0.69
Ú
0.69
igated
0.68
Versions
0.66
Activations Density 0.158%