INDEX
Explanations
instances of the word "tolerance" or related terms
references to concepts of tolerance and acceptance
New Auto-Interp
Negative Logits
prints
-0.79
tein
-0.73
Downloadha
-0.69
call
-0.68
grave
-0.67
wind
-0.64
-0.64
guard
-0.63
eu
-0.62
wer
-0.62
POSITIVE LOGITS
tolerant
1.05
tolerance
1.00
toler
0.98
tolerate
0.98
olerance
0.91
intolerance
0.90
intoler
0.87
tolerated
0.81
olini
0.72
terness
0.72
Activations Density 0.022%