INDEX
Explanations
names of individuals or specific entities
New Auto-Interp
Negative Logits
%:
-0.65
hart
-0.65
rylic
-0.63
iatrics
-0.61
xious
-0.61
NIC
-0.60
CARE
-0.59
ware
-0.58
ghai
-0.57
Helpful
-0.57
POSITIVE LOGITS
clerosis
1.09
heet
1.04
ourced
0.92
aurus
0.91
ourcing
0.90
ions
0.87
ources
0.87
ophical
0.86
atile
0.83
lav
0.83
Activations Density 0.024%