INDEX
Explanations
phrases indicating understanding, agreement, or comprehension
statements of comprehension or understanding
New Auto-Interp
Negative Logits
rouse
-0.76
woods
-0.72
rock
-0.70
strip
-0.69
die
-0.68
metal
-0.67
onies
-0.67
Ranked
-0.67
pload
-0.65
endar
-0.64
POSITIVE LOGITS
understands
0.74
Duc
0.71
ĺħ
0.71
ible
0.71
Understand
0.70
Stafford
0.70
ably
0.69
Languages
0.68
ances
0.67
iotic
0.67
Activations Density 0.034%