INDEX
Explanations
conditional phrases that suggest potential actions or outcomes
New Auto-Interp
Negative Logits
arge
-0.15
gal
-0.15
otal
-0.14
gun
-0.14
leta
-0.14
ety
-0.14
ella
-0.14
ego
-0.14
ropri
-0.14
Inner
-0.14
POSITIVE LOGITS
iliz
0.15
Fx
0.15
jav
0.14
ÛĮدÙĨ
0.14
ìĿ´ëĬĶ
0.14
alet
0.14
677
0.14
lac
0.13
Ùħاد
0.13
675
0.13
Activations Density 0.020%