INDEX
Explanations
words related to challenges, controversy, and risk
punctuation marks and their usage in the text
New Auto-Interp
Negative Logits
ãĤ´ãĥ³
-0.90
izen
-0.67
args
-0.64
ãĥį
-0.63
ãĥ¯
-0.62
¬¼
-0.61
iren
-0.61
atars
-0.60
isi
-0.60
acci
-0.60
POSITIVE LOGITS
yeah
1.23
whereas
1.07
uh
0.98
blah
0.97
frankly
0.96
[
0.94
obviously
0.94
basically
0.94
because
0.92
but
0.90
Activations Density 0.333%