INDEX
Explanations
references to the word "rum" at various strengths
mentions of "rum."
New Auto-Interp
Negative Logits
lihood
-0.85
Parenthood
-0.84
KI
-0.65
Shades
-0.64
Kut
-0.61
Wildcats
-0.61
Egyptians
-0.61
Stafford
-0.61
Cancel
-0.61
Merit
-0.60
POSITIVE LOGITS
inating
1.01
rum
1.00
oured
0.99
ble
0.98
atis
0.93
pled
0.91
rum
0.85
inally
0.84
inate
0.83
ination
0.83
Activations Density 0.009%