INDEX
Explanations
references to organizations, events or places
New Auto-Interp
Negative Logits
é¾į
-0.73
Ł
-0.72
Ķ
-0.70
uci
-0.70
role
-0.69
TeX
-0.67
]=
-0.67
ibo
-0.67
Throw
-0.66
rase
-0.66
POSITIVE LOGITS
however
0.73
meanwhile
0.63
Schn
0.63
Gil
0.62
Kap
0.62
Goldstein
0.62
though
0.60
atever
0.59
Gus
0.59
huh
0.58
Activations Density 0.318%