INDEX
Explanations
abstract states or qualities
New Auto-Interp
Negative Logits
shapes
0.68
shoes
0.64
pieces
0.63
tabla
0.63
independently
0.61
shape
0.60
recta
0.60
tablas
0.59
Shape
0.59
shaped
0.59
POSITIVE LOGITS
nationalism
1.15
bullying
0.98
censorship
0.93
windowActionBar
0.90
Nationalism
0.89
optimism
0.88
sarcasm
0.87
patriotism
0.87
ャ
0.85
hypocrisy
0.84
Activations Density 0.586%