INDEX
Explanations
the word "Mirror" followed by a number from 9 to 10
mentions of the term "Mirror" in various contexts
New Auto-Interp
Negative Logits
ourke
-0.73
yright
-0.70
reat
-0.67
ales
-0.67
reek
-0.67
ential
-0.66
athetic
-0.65
RAFT
-0.65
adding
-0.64
akings
-0.64
POSITIVE LOGITS
Mirror
1.29
ror
0.88
istg
0.80
mirror
0.76
Scroll
0.74
Divinity
0.71
Mir
0.70
discrep
0.70
wip
0.69
angelo
0.67
Activations Density 0.005%