INDEX
Explanations
mentions of importance or significance
references to the concept of importance
New Auto-Interp
Negative Logits
Wonderland
-0.84
TERN
-0.70
Labyrinth
-0.68
Flesh
-0.65
Sheep
-0.64
Snow
-0.63
dream
-0.63
Flat
-0.62
Jelly
-0.62
Monstrous
-0.61
POSITIVE LOGITS
importance
1.29
proble
1.13
significance
0.93
tremend
0.89
uese
0.87
traged
0.87
xual
0.86
olicy
0.86
alore
0.85
notation
0.85
Activations Density 0.012%