INDEX
Explanations
adjectives or phrases that express an evaluation or judgment
statements that assert characteristics or qualities, often using the phrase "is" to indicate descriptions or evaluations
New Auto-Interp
Negative Logits
inders
-0.76
rike
-0.70
Chero
-0.66
ummies
-0.65
gg
-0.63
ffe
-0.63
nikov
-0.62
cess
-0.58
arty
-0.58
hero
-0.57
POSITIVE LOGITS
incidentally
0.85
presumably
0.83
admittedly
0.73
ironically
0.71
ometimes
0.70
PK
0.69
baugh
0.69
coincided
0.66
?)
0.65
culminated
0.65
Activations Density 0.358%