INDEX
Explanations
text within square brackets that express opinions or attributions
quoted speech or dialogue within the text
New Auto-Interp
Negative Logits
Tang
-0.59
Miss
-0.55
Lens
-0.53
Rider
-0.53
family
-0.53
Gate
-0.52
normal
-0.51
castle
-0.50
âĢIJ
-0.48
2018
-0.48
POSITIVE LOGITS
"[
2.82
"[
2.60
"(
1.89
'[
1.74
"(
1.71
"'
1.65
"â̦
1.52
"{1.50
"#
1.48
"...
1.45
Activations Density 0.018%