INDEX
Explanations
mentions of the word "iter" at a high activation level
instances of the term "iter" in various contexts
New Auto-Interp
Negative Logits
ership
-0.91
¥µ
-0.75
ecause
-0.75
itionally
-0.72
¬¼
-0.71
ruciating
-0.70
sterdam
-0.70
cipled
-0.69
orld
-0.68
yssey
-0.67
POSITIVE LOGITS
ariat
0.93
ocene
0.85
iter
0.81
imation
0.78
imate
0.78
aton
0.77
ror
0.75
asury
0.75
ATURE
0.75
ason
0.74
Activations Density 0.023%