INDEX
Explanations
references to specific individuals and their perceived characteristics or accomplishments
New Auto-Interp
Negative Logits
catentry
-0.82
natureconservancy
-0.77
intrusion
-0.71
Role
-0.68
glove
-0.68
inconsistency
-0.66
timeline
-0.66
instr
-0.65
clause
-0.65
hospitality
-0.65
POSITIVE LOGITS
icons
0.88
realizes
0.80
suffers
0.79
faces
0.76
mates
0.74
realise
0.73
bers
0.71
understands
0.71
realised
0.70
herty
0.70
Activations Density 0.184%