INDEX
Explanations
references to gems or jewels
New Auto-Interp
Negative Logits
OGND
-0.56
controllable
-0.51
langs
-0.49
Rourke
-0.48
Crouch
-0.47
controlling
-0.46
controlled
-0.45
Nasty
-0.45
controlled
-0.45
rån
-0.45
POSITIVE LOGITS
Gems
1.06
Gem
1.05
gems
1.03
GEM
1.01
Gem
0.97
gem
0.97
Gems
0.90
gems
0.87
Gemma
0.84
gemstone
0.83
Activations Density 0.004%