INDEX
Explanations
phrases related to initiating, causing, or starting something
phrases indicating causation or conditions leading to significant outcomes
New Auto-Interp
Negative Logits
.",
-0.81
?",
-0.70
",
-0.70
orsi
-0.70
!",
-0.64
â̦."
-0.63
(?,
-0.62
(£
-0.61
.?
-0.61
',
-0.60
POSITIVE LOGITS
ãĥĩãĤ£
0.78
ãĥ¥
0.74
winds
0.66
ãĥĭ
0.62
voy
0.62
VIDE
0.60
arently
0.59
ãĥĨãĤ£
0.58
arks
0.58
ãĥİ
0.56
Activations Density 0.464%