INDEX
Explanations
party, advice, maxi, current, help
New Auto-Interp
Negative Logits
“[
1.06
(“
1.03
(...)
1.03
[...]
1.02
(...)
0.96
["
0.94
[
0.93
("0.92
“
0.91
"[
0.91
POSITIVE LOGITS
allah
0.88
flew
0.84
jesus
0.84
july
0.83
knew
0.82
music
0.80
bike
0.80
june
0.80
vrouw
0.80
woman
0.79
Activations Density 0.187%