INDEX
Explanations
adjectives relating to personal characteristics or attributes
terms related to personal matters and health issues
New Auto-Interp
Negative Logits
ÅĤ
-0.70
risome
-0.69
seams
-0.67
Arrows
-0.66
phase
-0.62
д
-0.61
ynt
-0.60
Writer
-0.59
sterdam
-0.58
sea
-0.58
POSITIVE LOGITS
pronouns
0.83
pronoun
0.71
£ı
0.70
hobbies
0.66
atures
0.66
ttle
0.66
UNCLASSIFIED
0.66
styles
0.65
identifiable
0.64
qui
0.63
Activations Density 0.225%