INDEX
Explanations
mentions of the word "aba" at varying levels of emphasis
mentions of a specific name or term
New Auto-Interp
Negative Logits
ivities
-0.80
lv
-0.76
wise
-0.75
holders
-0.72
ndra
-0.72
worthy
-0.72
rees
-0.70
rider
-0.69
etts
-0.68
cards
-0.66
POSITIVE LOGITS
aba
0.98
ipal
0.76
ãĤ§
0.76
oded
0.74
ifiable
0.73
isi
0.73
Allah
0.71
verning
0.70
uthor
0.70
ouk
0.70
Activations Density 0.022%