INDEX
Explanations
proper nouns or names of individuals
instances of the word "said," indicating reported speech or quotations
New Auto-Interp
Negative Logits
figure
-0.70
wed
-0.62
ÂŃ
-0.61
clad
-0.60
enburg
-0.60
pees
-0.58
inund
-0.57
eland
-0.56
anu
-0.56
izable
-0.56
POSITIVE LOGITS
goodbye
0.85
hello
0.83
:
0.82
=\"
0.81
escription
0.77
.:
0.74
Hello
0.74
:
0.72
:-
0.71
:]
0.69
Activations Density 0.047%