INDEX
Explanations
historical names and terms
New Auto-Interp
Negative Logits
ndra
-0.69
owe
-0.69
abwe
-0.67
steen
-0.66
oso
-0.63
akeru
-0.62
nyder
-0.62
ghai
-0.60
ippi
-0.59
Kaf
-0.59
POSITIVE LOGITS
ulhu
0.90
INAL
0.86
ishops
0.72
©¶æ
0.71
agos
0.66
eal
0.65
®
0.64
Truth
0.64
final
0.64
ROM
0.63
Activations Density 0.059%