INDEX
Explanations
mentions of the adjective "low" in various contexts
references to low-cost or low-level attributes
New Auto-Interp
Negative Logits
tnc
-0.86
andise
-0.85
itutional
-0.71
Orient
-0.69
indal
-0.68
ophers
-0.68
Ashe
-0.67
Mahjong
-0.67
arium
-0.66
natureconservancy
-0.66
POSITIVE LOGITS
enough
0.86
est
0.85
hanging
0.83
pitched
0.82
enthal
0.82
ball
0.79
ered
0.75
ppy
0.75
(<
0.74
ened
0.74
Activations Density 0.029%