INDEX
Explanations
words related to items or concepts that are of a lower value or rank
references to the word "low" in various contexts
New Auto-Interp
Negative Logits
tnc
-0.94
andise
-0.81
itutional
-0.69
ADRA
-0.67
andum
-0.67
Orient
-0.67
illon
-0.66
ilage
-0.65
ophers
-0.65
Pengu
-0.64
POSITIVE LOGITS
ered
0.91
est
0.90
enthal
0.88
ball
0.83
hanging
0.81
(<
0.78
enough
0.77
ppy
0.77
pitched
0.76
down
0.76
Activations Density 0.039%