INDEX
Explanations
variations of the word "roll" in different contexts
New Auto-Interp
Negative Logits
rend
-0.71
TAG
-0.69
eal
-0.67
iem
-0.67
yrinth
-0.66
len
-0.64
comprom
-0.64
ld
-0.64
unction
-0.63
oppers
-0.63
POSITIVE LOGITS
out
0.77
prevail
0.64
onward
0.63
Out
0.61
numbered
0.59
baugh
0.57
down
0.57
out
0.57
tered
0.56
thunder
0.56
Activations Density 0.022%