INDEX
Explanations
instances of the word "sitting"
instances of the word "sitting."
New Auto-Interp
Negative Logits
iler
-0.75
ctr
-0.72
endish
-0.71
Whit
-0.70
raid
-0.68
positive
-0.68
olini
-0.66
ilers
-0.66
escal
-0.66
obs
-0.65
POSITIVE LOGITS
sitting
1.01
toget
0.98
seiz
0.87
duck
0.86
stairs
0.84
horizont
0.83
Sitting
0.83
room
0.82
shenan
0.81
escription
0.80
Activations Density 0.008%