INDEX
Explanations
phrases describing or recounting events and actions
instances of the word "described" in various contexts
New Auto-Interp
Negative Logits
alde
-0.67
surpassed
-0.63
valid
-0.61
ãģ¨
-0.61
aceous
-0.60
ï¸ı
-0.60
iest
-0.60
ãĥ©
-0.60
Phys
-0.59
fort
-0.59
POSITIVE LOGITS
igate
0.84
igating
0.84
igated
0.78
artz
0.75
himself
0.73
difficulties
0.70
themselves
0.66
him
0.65
herself
0.65
similarities
0.64
Activations Density 0.081%