INDEX
Explanations
expressions related to interest or curiosity
New Auto-Interp
Negative Logits
Caw
-0.57
utafitiHapana
-0.50
Goy
-0.50
Rüyada
-0.49
RetentionPolicy
-0.49
giù
-0.48
glyph
-0.47
jaws
-0.46
Lawton
-0.46
tasche
-0.46
POSITIVE LOGITS
Interest
1.04
Interest
0.94
interest
0.91
interest
0.89
Interests
0.84
INTEREST
0.77
interested
0.77
INTEREST
0.76
interests
0.73
Interested
0.73
Activations Density 0.105%