INDEX
Explanations
the word "Lo" with the highest activation value
the recurring mention of the name "Lo."
New Auto-Interp
Negative Logits
ãĥ¥
-0.73
aur
-0.71
patrick
-0.70
ivery
-0.68
ahime
-0.67
orpor
-0.66
ãģĨ
-0.65
phosphate
-0.65
atomic
-0.62
edge
-0.60
POSITIVE LOGITS
Lo
3.63
Lo
2.78
lo
1.71
LO
1.64
lo
1.63
LO
1.29
Ho
1.26
Loop
1.22
Ro
1.13
Lot
1.10
Activations Density 0.016%