INDEX
Explanations
instances where things are described as contrasting or conflicting
phrases that describe contrast or comparison between different states or experiences
New Auto-Interp
Negative Logits
imar
-0.70
thora
-0.64
height
-0.61
Hispanic
-0.60
76561
-0.60
obia
-0.60
Bulgar
-0.59
BSD
-0.59
advant
-0.58
Ĭ±
-0.58
POSITIVE LOGITS
actual
1.24
tangible
0.99
actual
0.98
real
0.95
offline
0.91
Actual
0.90
reality
0.87
courtroom
0.85
everyday
0.83
onstage
0.81
Activations Density 0.559%