INDEX
Explanations
patterns or structures in tabular data
New Auto-Interp
Negative Logits
erson
-0.19
onder
-0.17
ipel
-0.15
ippo
-0.15
orra
-0.15
ughters
-0.14
115
-0.14
Nicholson
-0.14
essen
-0.13
prung
-0.13
POSITIVE LOGITS
ogs
0.15
zin
0.15
æ¨
0.15
zas
0.15
lsa
0.14
uble
0.14
ê¸
0.14
lap
0.14
itm
0.14
zda
0.13
Activations Density 0.006%